Keeping Track of Data Load Sources

This tip was authored by Jim Knicely.

If you are like most companies, you will be loading data into a Vertica table from many different data files. And if you are like me, you’ll forget later which file a table record arrived in. Fortunately the CURRENT_LOAD_SOURCE function returns the file name used when executing a COPY statement and I can use it to keep a record of the data file used.

Example:

dbadmin=> CREATE TABLE daily_load (id INT, some_data1 VARCHAR(100), some_data2 VARCHAR(100), file_loaded VARCHAR(200));
CREATE TABLE

dbadmin=> COPY daily_load (id, some_data1, some_data2, file_loaded AS CURRENT_LOAD_SOURCE()) FROM '/home/dbadmin/test*_load.txt';
Rows Loaded
-------------
           6
(1 row)

dbadmin=> SELECT * FROM daily_load;
 id | some_data1 | some_data2 |         file_loaded
----+------------+------------+------------------------------
  1 | TEST1      | TEST2      | /home/dbadmin/test1_load.txt
  2 | TEST10     | TEST20     | /home/dbadmin/test1_load.txt
  3 | TEST100    | TEST200    | /home/dbadmin/test1_load.txt
  4 | TEST3      | TEST4      | /home/dbadmin/test2_load.txt
  5 | TEST30     | TEST40     | /home/dbadmin/test2_load.txt
  6 | TEST300    | TEST400    | /home/dbadmin/test2_load.txt
(6 rows)

Have Fun!

Sign In or Register to comment.