copy failing with quoted empty strings to datetime

We are new to vertica and are trying to convert out application from IQ over to Vertica ( v9.0.0-0). Issue we are having is getting the copy command to work with pipe delimited quoted strings. We get a feed from a government site (over 300 columns wide) and monthly refresh files over 5 million records and weekly update files about 10-20K records. Both monthly and weekly files are pipe delimited but the issue is weekly files have quoted strings. The copy command works on monthly but fails on weekly feed with quoted strings. Best I can understand is that because the copy command sees quotes it is thinking character string so empty character string is the result which does not work into a datetime datatype but it is ok for || to go to null for a datetime datatype.
So what is the solution to this? One workaround I can see is for us to edit the file before hand to remove all the quoted string which could potentially cause issues if the "" is around an embedded delimiter. We also found another 'solution' someone was discussing in 2016 where each column needed to be named out and case statements used to convert the empty string to null for specific columns. This is not exactly pretty with how wide this file is.
https://dba.stackexchange.com/questions/128070/copy-in-tsv-data-enclosed-by-quotes-treat-empty-strings-as-nulls
I guess another would be load into a temp table of all varchar and then have a separate insert statement with cases statements.
Is there a way to tell the copy command to convert empty strings to null so it can then be inserted into a date or datetime datatype?

Example code we came up with to demonstrate it.

create table VerticaCopyTest(
int_field integer,
string_field varchar(10) null,
timestamp_field1 datetime null,
timestamp_field2 datetime null);

COPY VerticaCopyTest(int_field, string_field, timestamp_field1, timestamp_field2)
FROM LOCAL '/stage/data/verticacopytest_pipedelimited.csv' DELIMITER '|' DIRECT;

COPY VerticaCopyTest(int_field, string_field, timestamp_field1, timestamp_field2)
FROM LOCAL '/stage/data/verticacopytest.csv' DELIMITER ',' ENCLOSED BY '"' DIRECT;

verticacopytest_pipedelimited.csv is
23|firstval||

and verticacopytest.csv is -- this file fails
"24","secondval","",""

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited January 2018

    Hi,

    Welcome to the wonderful world of Vertica :smiley:

    Take a look at the built-in CSV file parser. It should be able to load your data without modification.

    Example:

    dbadmin=> \! cat /home/dbadmin/verticacopytest.csv
    "24","secondval","",""
    
    dbadmin=> \d VerticaCopyTest
                                                 List of Fields by Tables
     Schema |      Table      |      Column      |    Type     | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-----------------+------------------+-------------+------+---------+----------+-------------+-------------
     public | VerticaCopyTest | int_field        | int         |    8 |         | f        | f           |
     public | VerticaCopyTest | string_field     | varchar(10) |   10 |         | f        | f           |
     public | VerticaCopyTest | timestamp_field1 | timestamp   |    8 |         | f        | f           |
     public | VerticaCopyTest | timestamp_field2 | timestamp   |    8 |         | f        | f           |
    (4 rows)
    
    dbadmin=> COPY VerticaCopyTest FROM '/home/dbadmin/verticacopytest.csv' PARSER fcsvparser(HEADER='false') DIRECT;
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM VerticaCopyTest;
     int_field | string_field | timestamp_field1 | timestamp_field2
    -----------+--------------+------------------+------------------
            24 | secondval    |                  |
    (1 row)
    

    See:
    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/FlexTables/FCSVPARSERreference.htm

  • very good. we will lookup fcsvparser and get to know it. Thank you for your help.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Fyi... The fcsvparser parser performs a bit slower than the default parser.

    Another option for you is to use the FILLER parameter like so:

    dbadmin=> TRUNCATE TABLE VerticaCopyTest;
    TRUNCATE TABLE
    
    dbadmin=> \! cat /home/dbadmin/verticacopytest.csv
    "24","secondval","",""
    
    dbadmin=> COPY VerticaCopyTest (int_field, string_field, timestamp_field1_f FILLER VARCHAR(20), timestamp_field2_f FILLER VARCHAR(20), timestamp_field1 AS DECODE(timestamp_field1_f, '', NULL, timestamp_field1_f)::TIMESTAMP, timestamp_field2 AS DECODE(timestamp_field2_f, '', NULL, timestamp_field2_f)::TIMESTAMP) FROM '/home/dbadmin/verticacopytest.csv' ENCLOSED BY '"' DELIMITER ',' DIRECT;
     Rows Loaded
    -------------
               1
    (1 row)
    
    dbadmin=> SELECT * FROM VerticaCopyTest;
     int_field | string_field | timestamp_field1 | timestamp_field2
    -----------+--------------+------------------+------------------
            24 | secondval    |                  |
    (1 row)
    

    See:
    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/BulkLoadCOPY/IgnoringColumnsAndFieldsInTheLoadFile.htm

  • shayzshayz Vertica Customer

    Hi,
    Can I use FILLER with fcsvparser ?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file