Failed to load CSV file which as line break in quoted column

I am trying to load a CSV file which has a string column in quoted string. Unfortunately I have \n (new line) in the quoted string. Here is the example Cust_id, cust_address, city,zip 1, "1289 cobb parkway Bufford", "ATLANTA",34343 2, "1234 IVY lane Decatur", "ATLANTA",23435 I am trying to use following copy command COPY tempdb.test_cust FROM LOCAL 'test.dat' DELIMITER ',' ENCLOSED BY '"' NO ESCAPE SKIP 1 DIRECT; Above copy script fails with error message vsql:test.vsql:24: ERROR 2035: COPY(tempdb.test_cust): Input record 1 has been rejected (Too few columns found) It appears to me line break in between quoted string is considered as record delimiter. Do you have any solution to load this type of delimited file using COPY?

Comments

  • Navin_CNavin_C Vertica Customer
    Hello Vijay, Welcome to Vertica forums. The solution to your problem can be in tow steps. Step 1 If you file is as mentioned pattern, then you need a record terminator on every alternate line Either you can do this in Linux using a simple awk command
      1, "1289 cobb parkway   Bufford", "ATLANTA",34343  2, "1234 IVY lane   Decatur", "ATLANTA",23435    awk '!(NR%2){$(NF+1)=";"}1' filename    1, "1289 cobb parkway   Bufford", "ATLANTA",34343;  2, "1234 IVY lane   Decatur", "ATLANTA",23435;  
    I have placed a record terminator ';' on every alternate line. Step2 Now coming to Vertica, as you have record terminator defined in the csv file now, you can use the record terminator function with the COPY command this way.
      COPY tempdb.test_cust  FROM LOCAL 'test.dat'  DELIMITER ','  RECORD TERMINATOR ';'  ENCLOSED BY '"'  NO ESCAPE  SKIP 1  DIRECT;    
    This should do the trick for you. Note - If doing this in two step might not be a good idea, then there is a package called "External Filter" on Github which we can use to do this in a single COPY command. With external filter you have full access to command line tools while using the COPY command. Hope this helps.. :)
  • Thanks Navin. I looked at record set. Not all quoted strings have line break. Some records have multiple line break in quoted string. I thought "enclosed by" phrase would not consider \n as record delimiter. Is there any other alternative?
  • Hi, "Enclosed by" with the default DELIMITED parser does not enclose record terminators. (There are reasons for this involving some interesting bits of cleverness; unfortunately I can't go into more detail right at the moment.) However, you can escape newlines. Add a backslash right before each newline (so as the last character on the line), rather than surrounding it with quotes. Alternatively, you can tell Vertica to use use a different parser. If you're comfortable compiling source code, our SDK examples (installed to /opt/vertica/sdk/examples/ by default) includes the Rfc4180 parser, which parses CSV files formatted according to the IETF RFC-4180 standard (including enclosed record terminators among various other things). After you've loaded the parser into Vertica, you call it from the COPY statement with "COPY ... WITH PARSER [new parser that you want to use]"; see the .sql files in the above directory for examples. Yes, we know this process could be simpler :-) Adam
  • Adam, I could compile SDK examples and use RFC4180 Parser. Thanks for your help.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file