We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Importing data with Text Qualifiers — Vertica Forum

Importing data with Text Qualifiers

I am trying to import data into Vertica using the COPY command from a .txt file.

The file has field delimiter "\t", row delimiter "\n" and text qualifier "". So the data looks like this-

"Data1"       "Data2"       "Data3"        "Data1"       "Data2"       "Data3"        "Data1"       "Data2"       "Data3"        
"Data1"       "Data2"       "Data3"        "Data1"       "Data2"       "Data3"        "Data1"       "Data2"       "Data3"        "Data1"       "Data2"       "Data3"        

Notice how the data in one row spills over to the next line in the .txt file.
If I try to specify my delimiter as '"\t"', I get Error 2730: COPY delimiter must be a single character. Whats the work around to this?

I do not want to change the format of the exported data

Comments

  • You should be able to use the "E" to signify you want the specified delimiter escaped and treated as a literal. Example below for the tab field delimiter.

    ephemeraltest=> \! cat test.txt
    "data1" "data2" "data3"
    "data1a"        "data2a"        "data3"

    ephemeraltest=> copy test from '/home/dbadmin/test.txt' delimiter '\t';
    ERROR 2730:  COPY delimiter must be a single character
    ephemeraltest=> copy test from '/home/dbadmin/test.txt' delimiter E'\t';
     Rows Loaded
    -------------
               2

    I hope it helps


  • Hi Kunal,

    Several things here:

    First, strings in native standards-compliant SQL don't support backslash-escaping.  They don't need it -- all characters, including meta-characters, etc., are interpreted literally (except for the single-quote character, which is escaped by having two of them in a row).  So you probably want that string to be ['"    "'] (a literal tab; not sure how well that will render in HTML).

    Alternatively, if you want C-style backslash escape sequences, Vertica supports them via extended strings:  [E'"\t"']

    Second, what you refer to as a "qualifier" is what we call the "enclosing character".  I think you don't want [DELIMITER E'"\t"']; I think you want [DELIMITER E'\t' ENCLOSED BY'"'].

    In terms of data from one row spilling over from one line in the text file to the next -- it sounds to me like you do not in fact have a row delimiter?  You indicate a row delimiter "\n", but as you point out, I don't see any newlines in this example data.  Vertica does require a row delimiter (what we call a "record terminator"), and you must have one after each record.  This is how Vertica validates that it has correctly received a record, and recovers from invalid records.

    If you don't have a record terminator, but you have a format which would be amenable to inserting one, one option would be to compile and install the "ExternalFilter()" UDx from Vertica's github site (<http://github.com/vertica/Vertica-Extension-Packages/> -- in shell_load_package) and tell it to stream your data through "sed" and replace every third tab with a newline.  Note that, while this package is used by a number of Vertica users, it doesn't receive professional support/documentation; you have to figure it out yourself or by asking here on the forums.

    Another alternative, if you need custom parsing logic, would be to write a custom parser for your file format.  Vertica provides a "UDParser" API (C++ and Java) for extending the Vertica executable to support new parsing formats.  If you're interested in doing this, see "/opt/vertica/sdk/examples/ParserFunctions/" on any node in your Vertica cluster for some examples.

    Adam
  • Yes, but I want the \t AND the "" gone. The "" are there because my Data might have tabs in it, and I don't want that to be read as the field delimiter when its not.
  • Solved this - ENCLOSED BY feature of the COPY command

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file