Using sed to Handle Empty String NULL Values Enclosed in Control Characters

Jim_KnicelyJim_Knicely Administrator, Moderator, Employee, Registered User, VerticaExpert

You can load data files into a Vertica table using the COPY command. Your data can be delimited by control characters and separated by control characters. Unfortunately, if you also enclose NULL values as an empty string, you might run into a data cast issue.

Example:

dbadmin=> \d test1
                                  List of Fields by Tables
Schema | Table | Column |   Type    | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+-----------+------+---------+----------+-------------+-------------
public | test1 | c1     | int       |    8 |         | f        | f           |
public | test1 | c2     | timestamp |    8 |         | f        | f           |
public | test1 | c3     | int       |    8 |         | f        | f           |
(3 rows)

dbadmin=> \! cat -v /home/dbadmin/test1.csv
^V1^V^G^V2019-03-13 18:55:58.073597^V^G^V1^V
^V2^V^G^V2019-03-13 18:55:58.073597^V^G^V2^V
^V3^V^G^V^V^G^V3^V
^V4^V^G^V2019-03-13 18:55:58.073597^V^G^V4^V

dbadmin=> COPY test1 FROM '/home/dbadmin/test1.csv' DELIMITER e'\x07' ENCLOSED e'\x16' NULL AS '' DIRECT REJECTED DATA TABLE test1_bad;
Rows Loaded
-------------
           3
(1 row) 
dbadmin=> SELECT * FROM test1;
c1 |             c2             | c3
----+----------------------------+----
  1 | 2019-03-13 18:55:58.073597 |  1
  2 | 2019-03-13 18:55:58.073597 |  2
  4 | 2019-03-13 18:55:58.073597 |  4
(3 rows)

dbadmin=> SELECT rejected_reason FROM test1_bad;
                                   rejected_reason
--------------------------------------------------------------------------------------
Invalid timestamp format '' for column 2 (c2).Invalid input syntax for timestamp: ""
(1 row)

One workaround is to clean up the source file using the Linux command sed to replace the enclosed empty string NULL values.

dbadmin=> TRUNCATE TABLE test1;
TRUNCATE TABLE

dbadmin=> \! cat -v /home/dbadmin/test1.csv
^V1^V^G^V2019-03-13 18:55:58.073597^V^G^V1^V
^V2^V^G^V2019-03-13 18:55:58.073597^V^G^V2^V
^V3^V^G^V^V^G^V3^V
^V4^V^G^V2019-03-13 18:55:58.073597^V^G^V4^V

dbadmin=> \! sed -ie "s/^V^V//" /home/dbadmin/test1.csv

dbadmin=> \! cat -v /home/dbadmin/test1.csv
^V1^V^G^V2019-03-13 18:55:58.073597^V^G^V1^V
^V2^V^G^V2019-03-13 18:55:58.073597^V^G^V2^V
^V3^V^G^G^V3^V
^V4^V^G^V2019-03-13 18:55:58.073597^V^G^V4^V

dbadmin=> TRUNCATE TABLE test1;
TRUNCATE TABLE

dbadmin=> COPY test1 FROM '/home/dbadmin/test1.csv' DELIMITER e'\x07' ENCLOSED e'\x16' NULL AS '' DIRECT REJECTED DATA TABLE test1_bad;
Rows Loaded
-------------
           4
(1 row)

dbadmin=> SELECT * FROM test1;
c1 |             c2             | c3
----+----------------------------+----
  1 | 2019-03-13 18:55:58.073597 |  1
  2 | 2019-03-13 18:55:58.073597 |  2
  3 |                            |  3
  4 | 2019-03-13 18:55:58.073597 |  4
(4 rows)

Helpful Links:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/BulkLoadCOPY/BulkLoadingData.htm
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/BulkLoadCOPY/IgnoringColumnsAndFieldsInTheLoadFile.htm
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Literals/ExtendedStringLiterals.htm

Have fun!

Sign In or Register to comment.