Importing data with Text Qualifiers
I am trying to import data into Vertica using the COPY command from a .txt file.
The file has field delimiter "\t", row delimiter "\n" and text qualifier "". So the data looks like this-
"Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3"
"Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3"
Notice how the data in one row spills over to the next line in the .txt file.
If I try to specify my delimiter as '"\t"', I get Error 2730: COPY delimiter must be a single character. Whats the work around to this?
I do not want to change the format of the exported data
The file has field delimiter "\t", row delimiter "\n" and text qualifier "". So the data looks like this-
"Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3"
"Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3" "Data1" "Data2" "Data3"
Notice how the data in one row spills over to the next line in the .txt file.
If I try to specify my delimiter as '"\t"', I get Error 2730: COPY delimiter must be a single character. Whats the work around to this?
I do not want to change the format of the exported data
0
Comments
ephemeraltest=> \! cat test.txt
"data1" "data2" "data3"
"data1a" "data2a" "data3"
ephemeraltest=> copy test from '/home/dbadmin/test.txt' delimiter '\t';
ERROR 2730: COPY delimiter must be a single character
ephemeraltest=> copy test from '/home/dbadmin/test.txt' delimiter E'\t';
Rows Loaded
-------------
2
I hope it helps
Several things here:
First, strings in native standards-compliant SQL don't support backslash-escaping. They don't need it -- all characters, including meta-characters, etc., are interpreted literally (except for the single-quote character, which is escaped by having two of them in a row). So you probably want that string to be ['" "'] (a literal tab; not sure how well that will render in HTML).
Alternatively, if you want C-style backslash escape sequences, Vertica supports them via extended strings: [E'"\t"']
Second, what you refer to as a "qualifier" is what we call the "enclosing character". I think you don't want [DELIMITER E'"\t"']; I think you want [DELIMITER E'\t' ENCLOSED BY'"'].
In terms of data from one row spilling over from one line in the text file to the next -- it sounds to me like you do not in fact have a row delimiter? You indicate a row delimiter "\n", but as you point out, I don't see any newlines in this example data. Vertica does require a row delimiter (what we call a "record terminator"), and you must have one after each record. This is how Vertica validates that it has correctly received a record, and recovers from invalid records.
If you don't have a record terminator, but you have a format which would be amenable to inserting one, one option would be to compile and install the "ExternalFilter()" UDx from Vertica's github site (<http://github.com/vertica/Vertica-Extension-Packages/> -- in shell_load_package) and tell it to stream your data through "sed" and replace every third tab with a newline. Note that, while this package is used by a number of Vertica users, it doesn't receive professional support/documentation; you have to figure it out yourself or by asking here on the forums.
Another alternative, if you need custom parsing logic, would be to write a custom parser for your file format. Vertica provides a "UDParser" API (C++ and Java) for extending the Vertica executable to support new parsing formats. If you're interested in doing this, see "/opt/vertica/sdk/examples/ParserFunctions/" on any node in your Vertica cluster for some examples.
Adam