Load data from Amazon S3 bucket to Vertica
Hi, Requirement: Load data from Amazon S3 bucket to Vertica Database. Current Setup: Three node cluster hosted on AWS Version: Vertica 6.1.1 OS: Linux CentOS 6 Database and tables created. Data loaded through COPY command from vsql client Some of the data files are resided on S3, this has to be loaded to Database. Installed 'fuse' and mounted the S3 on one of the clusters(linux server) on which the Vertica is hosted. Using s3cmd command I am able to view all my files which is residing in the bucket. Issue: When I run the below command to load the data from S3, it says the file is not available Seems like it is not recognizing the S3 filesystem Command used to load data from S3: COPY SAM_REP_SCHEMA.UN_MOB_MM_IOS_DATA FROM 's3://samevalbucket/mediadata/UN_MOB_MM_IOS_DATA _25Apr13.CSV' DELIMITER '|' ENCLOSED BY '"' NO ESCAPE NULL AS '\N' EXCEPTIONS '/tmp/UN_MOB_MM_IOS_DATA_25Apr13_Excep.txt' REJECTED DATA '/tmp/UN_MOB_MM_IOS_DATA_25Apr13_Rejct.txt' ON v_samdb_node0001 ; Error: ERROR 2886: Could not open file [s3n://samevalbucket/mediadata/UN_MOB_MM_IOS_DATA _25Apr13.CSV] for reading; No such file or directory Questions: Is there any variable I need to set before loading from S3? Is this the right command to load data from S3 into Vertica? Please suggest...
0
Comments
What options and their benefits/limitations are to upload data to Vertica? Is S3 the only method? As I understand S3 also costs something. Is there some good way to export data from EC2 PostgreSQL instance to Vertica on EC2, that would be quick and not very costly?
Why don't just mount the S3 bucket ?
See example:
-create mount point /s301 and mount.
- create table and load data from S3 bucket.
- the rest is with you ! Is importnat that ou have your S3 bucket on the same Region as your AWS instances.
Loading data to Vertica from S3 bucket become very simple. At this point you do not need to have data on node. So there is no need to do anything with s3getfile / s3cmd / s3fs. Starting from version 7.2.2 Vertica shipped with Vertica Library for Amazon Web Services. This library will allow you to use URL to your data file directly in the COPY command.
I recommend to revisit the architecture of our loader. I am sure you will be able to simplify your current environment significantly. Plus your COPY commands will be much more simple too.
I blogged about it recently. Here is the link to a post which will provide you with all details you need: http://www.dbjungle.com/loading-data-from-aws-s3-bucket-to-vertica-made-easy/