COPY command with multi-character delimiters



I have data in the below mentioned format:





The file has multiple rows of data in the same format.


Can you tell me how to use it with COPY command as the Vertica allows only a SINGLE character delimiter.


I cant use a file parser as the size of the file is huge and would take immense time to parse.



Any inputs would be highly appreciated.




  • Options

    Create your data file.

    (dbadmin@:5433) [dbadmin] > \! cat /tmp/file.dat 

    Create the table

    (dbadmin@:5433) [dbadmin] > \d test
    List of Fields by Tables
    Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
    public | test | col1 | varchar(20) | 20 | | f | f |
    public | test | col2 | varchar(20) | 20 | | f | f |
    public | test | col3 | varchar(20) | 20 | | f | f |
    public | test | col4 | varchar(20) | 20 | | f | f |
    (4 rows)

    Create the copy commnad

    copy test(
    v1 filler varchar(20),
    v2 filler varchar(20),
    v3 filler varchar(20),
    v4 filler varchar(20),
    col1 as TRIM(BOTH '^' FROM v1) ,
    col2 as TRIM(BOTH '^' FROM v2),
    col3 as TRIM(BOTH '^' FROM v3),
    col4 as TRIM(BOTH '^' FROM v4) ) from '/tmp/file.dat' delimiter '~' direct;

    - you just need to use one single character delimiter, when loading , but you can transform the rest using fillers and the TRIM build-in function.


    Check that data 

    (dbadmin@:5433) [dbadmin] > select * from test;
    col1 | col2 | col3 | col4
    1234567 | 1234567 | 1234567 | 2004-02-02
    (1 row)


    I hope this was what you are looking for :) 

  • Options


    Thanks Adrian_Oprea_1 !! Your solution worked perfectly fine for above example. Using fillers was a great idea. 


    We are purposely using multi-character delimiters (^~^) as the columns may have data that can contain tilda (~). However, there is no support for multi-character delimiter in Vertica.


    Is there anything using which we can ignore tilda from the coumn data. 





    Looking forward for your response. 

  • Options

    I don't understand your question ? 

    could you rephrase it ?

  • Options

    Hi Adrian,


    What I mean to ask you is that, there is a possibility that the data might contain special characters. And if one of these special characters matches the delimiter then that ENTIRE ROW goes under Rejected data.


    So can you tell me how to handle this scenario i.e. what if the delimiter is a part of the data??







    Hope this makes it clear. Thanks.

  • Options


    I faced this in the past, i had to sweet a bit to fix it , but i could not do it dynamicly i had ot do it by hand. 

    I sudgesst to solve this at the "root" where the data is generated and make sure that you have specific delimiters in place.

  • Options

    Well, I too faced the same issue but unfortunately I dont have the authority to alter the data content. ;-) 


    So after a little ground work, we built a Parser, which could parse multi-character delimiters. And then after you deploy it. you can use it in the COPY commands as a parameter and it works perfectly.



    Thanks again for the help Adrian. 

  • Options

    Could you share it with the community ? 

  • Options
    marcothesanemarcothesane - Select Field - Administrator


    Hi tangiblegeek -

    If you can influence the creation of the load file - consider using string delimiters/enclosers:

    Instead of:


    you would have:


    And your COPY command would be:


    COPY <mytable> FROM LOCAL '<mytable.tsv>' DELIMITER '~' ENCLOSED BY '''' DIRECT;


    If the string delimiter can occur within the string, it would need to be escaped by a backslash ...

    Good luck


    (I work for HP) ... 

  • Options

    Hi Marco, 


    Unfortunately, I dont have the access to alter the data, but there is fix that we have put eventually and it has been working pretty fine till now, so decided to share that with the community.



    We are using a shell script, and using 'awk' and 'sed' commands we replace the multi-char delimiter with a single delimiter while loading. Hence, the file gets loaded perfiectly also, the file also does not get modified. 


    The flow is as follows:

    1. UDX calls shell script

    2. Newly constructed COPY statement gets fired at command prompt

    3. Delimiter gets replaced 'on-the-fly' (actual file is still the same)

    4. Shell returns control to UDX for remaining steps


    Hope this works for you guys.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file