How do I copy escaped tabs and new lines into Vertica?

rafecorafeco Registered User
I have data that I've exported from MySQL that I'm trying to load into Vertica using COPY. The file is tab-separated, and the tabs within the fields are escaped by MySQL using the standard \t string. New lines in fields are escaped using \n.

When I load them into Vertica, the \ characters are stripped and they're just loaded as "n" or "t". I'd like to find an approach where Vertica treats them as tabs and new lines as they should be. Is there some COPY syntax that I can use, or some transformation I can perform on the export file to get this to work?

Comments

  • Hi,

    If I interpret your question correctly, the problem is that the \t and \n are literal strings vs. actual tab and newline chars. Vertica only allows you to have/specify a single char as a field delimiter or record terminator. It does have the provision to tell the copy that the next char is escaped using the uppercase E as illustrated below:

    # here's the source file with real tab and newline chars as field and record delimiter/terminators.
    [[email protected] ~]$ cat testtab.txt
    1       one
    2       two
    3       three

    # here's a copy using \t and \n which errors because they are multiple character literal strings
    adserver=> copy testabload from '/home/dbadmin/testtab.txt' no escape delimiter '\t' record terminator '\n' exceptions '/home/dbadmin/testtabxept.txt';
    ERROR 2730:  COPY delimiter must be a single character
    # and here's the successful copy after adding the E to the delimiter and record terminator args forcing copy to treat them as operators vs. literals
    adserver=> copy testabload from '/home/dbadmin/testtab.txt' no escape delimiter E'\t' record terminator E'\n' exceptions '/home/dbadmin/testtabxept.txt';
               3

    # now here's what I think you may have for a file
    [[email protected] ~]$ cat testslasht.txt
    1\tone\n2\ttwo\n3\tthree\n

    # same multiple char error
    adserver=> copy testabload from '/home/dbadmin/testslasht.txt' no escape delimiter '\t' record terminator '\n' exceptions '/home/dbadmin/testslashtxept.txt';
    ERROR 2730:  COPY delimiter must be a single character

    # but in this case the escape E doesn't help because the chars in the file are literals not their respective operators
    adserver=> copy testabload from '/home/dbadmin/testslasht.txt' no escape delimiter E'\t' record terminator E'\n' exceptions '/home/dbadmin/testslashtxept.txt';
               0
    # and here's the exceptions file showing that it didn't interpret the file content as delimited
    adserver=> \! cat /home/dbadmin/testslashtxept.txt
    COPY: Input record 1 has been rejected (Invalid integer format '1\tone\n2\ttwo\n3\tthree\n' for column 1 (c1)).  Please see /home/dbadmin/adserver/v_adserver_node0001_catalog/CopyErrorLogs/testabload-testslasht.txt-copy-from-rejected-data, record 1 for the rejected record.
    COPY: Loaded 0 rows, rejected 1 rows.

    So you may have to preprocess the file using sed or awk to convert the "\t" and "\n" to real tab and newline chars

    [[email protected] ~]$ cat testslasht.txt
    1\tone\n2\ttwo\n3\tthree\n

    [[email protected] ~]$ cat testslasht.txt | sed -e 's/\\t/     /g' -e 's/\\n/\n/g'
    1       one
    2       two
    3       three

    I hope it helps.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file