Importing data with Text Qualifiers

Kunal_Agrawal · June 2014

I am trying to import data into Vertica using the COPY command from a .txt file.

The file has field delimiter "\t", row delimiter "\n" and text qualifier "". So the data looks like this-

"Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3"
"Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3"

Notice how the data in one row spills over to the next line in the .txt file.
If I try to specify my delimiter as '"\t"', I get Error 2730: COPY delimiter must be a single character. Whats the work around to this?

I do not want to change the format of the exported data

[Deleted User] · June 2014

You should be able to use the "E" to signify you want the specified delimiter escaped and treated as a literal. Example below for the tab field delimiter.

ephemeraltest=> \! cat test.txt
"data1" "data2" "data3"
"data1a" "data2a" "data3"

ephemeraltest=> copy test from '/home/dbadmin/test.txt' delimiter '\t';
ERROR 2730: COPY delimiter must be a single character
ephemeraltest=> copy test from '/home/dbadmin/test.txt' delimiter E'\t';
Rows Loaded
-------------
2

I hope it helps

[Deleted User] · June 2014

Hi Kunal,

Several things here:

First, strings in native standards-compliant SQL don't support backslash-escaping. They don't need it -- all characters, including meta-characters, etc., are interpreted literally (except for the single-quote character, which is escaped by having two of them in a row). So you probably want that string to be ['" "'] (a literal tab; not sure how well that will render in HTML).

Alternatively, if you want C-style backslash escape sequences, Vertica supports them via extended strings: [E'"\t"']

Second, what you refer to as a "qualifier" is what we call the "enclosing character". I think you don't want [DELIMITER E'"\t"']; I think you want [DELIMITER E'\t' ENCLOSED BY'"'].

In terms of data from one row spilling over from one line in the text file to the next -- it sounds to me like you do not in fact have a row delimiter? You indicate a row delimiter "\n", but as you point out, I don't see any newlines in this example data. Vertica does require a row delimiter (what we call a "record terminator"), and you must have one after each record. This is how Vertica validates that it has correctly received a record, and recovers from invalid records.

If you don't have a record terminator, but you have a format which would be amenable to inserting one, one option would be to compile and install the "ExternalFilter()" UDx from Vertica's github site (<http://github.com/vertica/Vertica-Extension-Packages/> -- in shell_load_package) and tell it to stream your data through "sed" and replace every third tab with a newline. Note that, while this package is used by a number of Vertica users, it doesn't receive professional support/documentation; you have to figure it out yourself or by asking here on the forums.

Another alternative, if you need custom parsing logic, would be to write a custom parser for your file format. Vertica provides a "UDParser" API (C++ and Java) for extending the Vertica executable to support new parsing formats. If you're interested in doing this, see "/opt/vertica/sdk/examples/ParserFunctions/" on any node in your Vertica cluster for some examples.

Adam

Kunal_Agrawal · June 2014

Yes, but I want the \t AND the "" gone. The "" are there because my Data might have tabs in it, and I don't want that to be read as the field delimiter when its not.

Kunal_Agrawal · July 2014

Solved this - ENCLOSED BY feature of the COPY command

Importing data with Text Qualifiers

Comments

Leave a Comment