copy failing with quoted empty strings to datetime
We are new to vertica and are trying to convert out application from IQ over to Vertica ( v9.0.0-0). Issue we are having is getting the copy command to work with pipe delimited quoted strings. We get a feed from a government site (over 300 columns wide) and monthly refresh files over 5 million records and weekly update files about 10-20K records. Both monthly and weekly files are pipe delimited but the issue is weekly files have quoted strings. The copy command works on monthly but fails on weekly feed with quoted strings. Best I can understand is that because the copy command sees quotes it is thinking character string so empty character string is the result which does not work into a datetime datatype but it is ok for || to go to null for a datetime datatype.
So what is the solution to this? One workaround I can see is for us to edit the file before hand to remove all the quoted string which could potentially cause issues if the "" is around an embedded delimiter. We also found another 'solution' someone was discussing in 2016 where each column needed to be named out and case statements used to convert the empty string to null for specific columns. This is not exactly pretty with how wide this file is.
https://dba.stackexchange.com/questions/128070/copy-in-tsv-data-enclosed-by-quotes-treat-empty-strings-as-nulls
I guess another would be load into a temp table of all varchar and then have a separate insert statement with cases statements.
Is there a way to tell the copy command to convert empty strings to null so it can then be inserted into a date or datetime datatype?
Example code we came up with to demonstrate it.
create table VerticaCopyTest(
int_field integer,
string_field varchar(10) null,
timestamp_field1 datetime null,
timestamp_field2 datetime null);
COPY VerticaCopyTest(int_field, string_field, timestamp_field1, timestamp_field2)
FROM LOCAL '/stage/data/verticacopytest_pipedelimited.csv' DELIMITER '|' DIRECT;
COPY VerticaCopyTest(int_field, string_field, timestamp_field1, timestamp_field2)
FROM LOCAL '/stage/data/verticacopytest.csv' DELIMITER ',' ENCLOSED BY '"' DIRECT;
verticacopytest_pipedelimited.csv is
23|firstval||
and verticacopytest.csv is -- this file fails
"24","secondval","",""
Comments
Hi,
Welcome to the wonderful world of Vertica
Take a look at the built-in CSV file parser. It should be able to load your data without modification.
Example:
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/FlexTables/FCSVPARSERreference.htm
very good. we will lookup fcsvparser and get to know it. Thank you for your help.
Fyi... The fcsvparser parser performs a bit slower than the default parser.
Another option for you is to use the FILLER parameter like so:
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/BulkLoadCOPY/IgnoringColumnsAndFieldsInTheLoadFile.htm
Hi,
Can I use FILLER with fcsvparser ?
No. See:
https://forum.vertica.com/discussion/241146/can-i-use-filler-with-fcsvparser#latest