How do I copy escaped tabs and new lines into Vertica?
I have data that I've exported from MySQL that I'm trying to load into Vertica using COPY. The file is tab-separated, and the tabs within the fields are escaped by MySQL using the standard \t string. New lines in fields are escaped using \n.
When I load them into Vertica, the \ characters are stripped and they're just loaded as "n" or "t". I'd like to find an approach where Vertica treats them as tabs and new lines as they should be. Is there some COPY syntax that I can use, or some transformation I can perform on the export file to get this to work?
When I load them into Vertica, the \ characters are stripped and they're just loaded as "n" or "t". I'd like to find an approach where Vertica treats them as tabs and new lines as they should be. Is there some COPY syntax that I can use, or some transformation I can perform on the export file to get this to work?
0
Comments
If I interpret your question correctly, the problem is that the \t and \n are literal strings vs. actual tab and newline chars. Vertica only allows you to have/specify a single char as a field delimiter or record terminator. It does have the provision to tell the copy that the next char is escaped using the uppercase E as illustrated below:
# here's the source file with real tab and newline chars as field and record delimiter/terminators.
[dbadmin@n4 ~]$ cat testtab.txt
1 one
2 two
3 three
# here's a copy using \t and \n which errors because they are multiple character literal strings
adserver=> copy testabload from '/home/dbadmin/testtab.txt' no escape delimiter '\t' record terminator '\n' exceptions '/home/dbadmin/testtabxept.txt';
ERROR 2730: COPY delimiter must be a single character
# and here's the successful copy after adding the E to the delimiter and record terminator args forcing copy to treat them as operators vs. literals
adserver=> copy testabload from '/home/dbadmin/testtab.txt' no escape delimiter E'\t' record terminator E'\n' exceptions '/home/dbadmin/testtabxept.txt';
3
# now here's what I think you may have for a file
[dbadmin@n4 ~]$ cat testslasht.txt
1\tone\n2\ttwo\n3\tthree\n
# same multiple char error
adserver=> copy testabload from '/home/dbadmin/testslasht.txt' no escape delimiter '\t' record terminator '\n' exceptions '/home/dbadmin/testslashtxept.txt';
ERROR 2730: COPY delimiter must be a single character
# but in this case the escape E doesn't help because the chars in the file are literals not their respective operators
adserver=> copy testabload from '/home/dbadmin/testslasht.txt' no escape delimiter E'\t' record terminator E'\n' exceptions '/home/dbadmin/testslashtxept.txt';
0
# and here's the exceptions file showing that it didn't interpret the file content as delimited
adserver=> \! cat /home/dbadmin/testslashtxept.txt
COPY: Input record 1 has been rejected (Invalid integer format '1\tone\n2\ttwo\n3\tthree\n' for column 1 (c1)). Please see /home/dbadmin/adserver/v_adserver_node0001_catalog/CopyErrorLogs/testabload-testslasht.txt-copy-from-rejected-data, record 1 for the rejected record.
COPY: Loaded 0 rows, rejected 1 rows.
So you may have to preprocess the file using sed or awk to convert the "\t" and "\n" to real tab and newline chars
[dbadmin@n4 ~]$ cat testslasht.txt
1\tone\n2\ttwo\n3\tthree\n
[dbadmin@n4 ~]$ cat testslasht.txt | sed -e 's/\\t/ /g' -e 's/\\n/\n/g'
1 one
2 two
3 three
I hope it helps.