Keeping Track of Data Load Sources
This tip was authored by Jim Knicely.
If you are like most companies, you will be loading data into a Vertica table from many different data files. And if you are like me, you’ll forget later which file a table record arrived in. Fortunately the CURRENT_LOAD_SOURCE function returns the file name used when executing a COPY statement and I can use it to keep a record of the data file used.
Example:
dbadmin=> CREATE TABLE daily_load (id INT, some_data1 VARCHAR(100), some_data2 VARCHAR(100), file_loaded VARCHAR(200));
CREATE TABLE
dbadmin=> COPY daily_load (id, some_data1, some_data2, file_loaded AS CURRENT_LOAD_SOURCE()) FROM '/home/dbadmin/test*_load.txt';
Rows Loaded
-------------
6
(1 row)
dbadmin=> SELECT * FROM daily_load;
id | some_data1 | some_data2 | file_loaded
----+------------+------------+------------------------------
1 | TEST1 | TEST2 | /home/dbadmin/test1_load.txt
2 | TEST10 | TEST20 | /home/dbadmin/test1_load.txt
3 | TEST100 | TEST200 | /home/dbadmin/test1_load.txt
4 | TEST3 | TEST4 | /home/dbadmin/test2_load.txt
5 | TEST30 | TEST40 | /home/dbadmin/test2_load.txt
6 | TEST300 | TEST400 | /home/dbadmin/test2_load.txt
(6 rows)
Have Fun!
0