How to escape Special characters while using export to delimited.

In my product we are using vertica version 12.04 where we are using export to delimited to export csv into s3 bucket. This is a sample data :-
select * from srijan;
id | name
----+------------------------
1 | srijan chakraborty
2 | Mr, srijan chakraborty
3 | X"≫
4 | "lakshman"'

How to get proper valid parsable csv while doing export to delimited with delimiter as , ?

Answers

  • marcothesanemarcothesane - Select Field - Administrator

    Have you looked at the EXPORT TO DELIMITED() command?

    Check this out :
    https://docs.vertica.com/23.4.x/en/sql-reference/statements/export-to-delimited/

  • marcothesanemarcothesane - Select Field - Administrator

    What you showed in your question is the output of vsql when you launch it and send its output to the screen.
    When I export to delimited (in my case to a Posix directory, but the file would look the same in an S3 bucket), I would to this to get RFC compliant output:

    EXPORT TO DELIMITED (
      directory='/data/customer'
    , ifDirExists='overwrite'
    , delimiter=','
    , enclosedBy='"'
    , escapeAs='"'
    , addHeader='true'
    , filename='customer'
    )
    AS
    SELECT 
      c_id
    , c_from_dt
    , c_to_dt
    , c_fname
    , c_lname
    FROM scd.d_customer_scd
    ORDER BY c_id;
    

    ... to get this file:

    c_id,c_from_dt,c_to_dt,c_fname,c_lname
    1,"2024-03-02","9999-12-31","Arthur","Dent"
    2,"2024-03-02","9999-12-31","Ford","Prefect"
    3,"2024-02-02","9999-12-31","Zaphod","Beeblebrox"
    4,"2024-02-02","9999-12-31","Tricia","McMillan"
    5,"2024-03-02","9999-12-31","named ""Gag""","Halfrunt"
    6,"2024-03-02","9999-12-31","Prostetnic Vogon","Jeltz"
    7,"2024-03-02","9999-12-31","Lionel","Prosser"
    8,"2024-02-02","9999-12-31","Benji","Mouse"
    9,"2024-02-02","9999-12-31","Frankie","Mouse"
    10,"2024-03-02","9999-12-31","Wonko","The Sane"
    10,"2024-02-02","2024-03-02","Wonko","The Sane"
    10,"2024-01-02","2024-02-02","Wonko","The Sane"
    10,"2023-12-02","2024-01-02","Wonko","The Sane"
    10,"2023-11-02","2023-12-02","Wonko","The Sane"
    11,"2024-03-02","9999-12-31","Eccentrica","Gallumbitis"
    
    
  • under my case I get the following file on doing export to delimited EXPORT TO DELIMITED (directory='s3://vertica-s3-export-stg1/bench', delimiter=',', enclosedBy='"', escapeAs='"', addHeader='true', filename='test') AS SELECT * FROM (SELECT * from srijan) AS FinalUser;

    file output as following :-

    id,name
    1,"srijan chakraborty"
    2,"Mr", srijan chakraborty"
    3,"X""≫"
    
    
  • moshegmosheg Vertica Employee Administrator
    edited April 2

    Try this:

    select * from srijan order by 1;
    
     id |          name
    ----+------------------------
      1 | srijan chakraborty
      2 | Mr, srijan chakraborty
      3 | X"≫
      4 | "lakshman"'
    (4 rows)
    
    EXPORT TO DELIMITED (
            directory='/MYPATH/MYDIR',
            delimiter=',',
            escapeAs='\',
            addHeader='true',
            filename='test',
            ifDirExists='overwrite')
    AS SELECT * from srijan;
    
    \! cat /MYPATH/MYDIR/test.csv
    id,name
    1,srijan chakraborty
    4,"lakshman"'
    2,Mr\, srijan chakraborty
    3,X"≫
    
    truncate table srijan;
    
    COPY srijan
         FROM '/MYPATH/MYDIR/test.csv'
         ON v_eevdb_node0001
         SKIP 1
         DELIMITER ','
         ABORT ON ERROR;
    
    select * from srijan order by 1;
     id |          name
    ----+------------------------
      1 | srijan chakraborty
      2 | Mr, srijan chakraborty
      3 | X"≫
      4 | "lakshman"'
    (4 rows)
    
  • moshegmosheg Vertica Employee Administrator

    Because the table data itself can contain commas, vertical bars, new lines, slashes, and backslashes,
    it is advised to use a delimiter that consists of non-printable characters to export data.
    In the following example, we will use \001 (CTRL-A) as field separator.
    In addition, to avoid confusion between NULLs and EMPTY STRINGS, we can use the string REALNULL to represent NULLs in the output file.

    cat my_export_test.sql
    drop table if exists srijan cascade;
    create table srijan (id int, name varchar(100));
    
    COPY srijan FROM STDIN DELIMITER '|' ABORT ON ERROR;
    1|srijan chakraborty|
    2|Mr, srijan chakraborty|
    3|X"≫|
    4|"lakshman"'|
    \.
    
    select * from srijan order by 1;
    \! rm -f /MYPATH/MYDIR/test.csv
    
    EXPORT TO DELIMITED (
            directory='/MYPATH/MYDIR',
            delimiter=E'\001',
            escapeAs='\',
            addHeader='true',
            filename='test',
            nullAs='REALNULL',
            ifDirExists='overwrite')
    AS SELECT * from srijan;
    
    \! cat /MYPATH/MYDIR/test.csv
    
    truncate table srijan;
    
    COPY srijan
         FROM '/MYPATH/MYDIR/test.csv'
         ON v_eevdb_node0001
         SKIP 1
         DELIMITER E'\001'
         null 'REALNULL'
         ABORT ON ERROR;
    
    select * from srijan order by 1;
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file