How to copy data from hive table (with partitions) to vertica table?
I have data in a table in hive which has a single partition (date string). I am using the COPY command in vsql to move the data from there (with webhdfs URL for hive table location) to a table in vertica. I am faced with the two scenarios -
1. Vertica table has no partition column (so there is one column less than hive table) - In this case, when I issue COPY command, the data gets copied without issues.
2. Vertica table is also partitioned with date string column (hive and vertica have same number of columns now) - When I try to COPY in this case, all the records get rejected and copy fails. Here is the error that I get - ERROR 2006: COPY: 10000 records have been rejected
The exact COPY command that I am issuing goes like this -
COPY vertica_table SOURCE Hdfs(url='http:url:50070/webhdfs/v1/user/warehouse/hive_db.db/hive_table_name/date=20170530/*') DELIMITER E'\001' NULL AS '' NO ESCAPE REJECTMAX 10000 DIRECT;
Can someone please help on how to get the data copied with partitions?
Thanks!
Best Answer
-
swalkaus Vertica Employee Employee
You can use a constant value for the partition column.
If "vertica_table" has two columns, "col1" and the partition column, named "date", the "COPY" statement would look something like this:
COPY vertica_table(col1, "date" AS '20170530') SOURCE Hdfs(...) DELIMITER E'\001' NULL AS '' NO ESCAPE REJECTMAX 10000 DIRECT;
For more details see https://my.vertica.com/docs/latest/HTML/index.htm#Authoring/AdministratorsGuide/BulkLoadCOPY/IgnoringColumnsAndFieldsInTheLoadFile.htm.
5
Answers
Thanks a lot @swalkaus ! The above solution worked for me.