Storing source file names when loading data
You can manipulate fields when loading data with the COPY command, but did you know you can also store the source file name? Let's see how:
Suppose I have hourly data files in the pattern hourlyXX.csv where XX is the hour from 00-23. I want to load them into a single staging table.
Let's define the staging table:
dbadmin=> create table hourly (filename VARCHAR, symbol VARCHAR, tm TIME, q1 int, q2 int, q3 int);
COPY data with "filename" set to the source file name or CURRENT_LOAD_SOURCE() and use a wildcard to match all "hourly" files:
dbadmin=> COPY hourly (filename AS CURRENT_LOAD_SOURCE(), symbol, tm, q1, q2, q3) FROM '/tmp/hourly*.csv';
Now let's see what we loaded:
dbadmin=> select * from hourly ;
filename | symbol | tm | q1 | q2 | q3
-------------------+--------+----------+----+----+----
/tmp/hourly00.csv | AAA | 00:01:02 | 1 | 2 | 3
/tmp/hourly00.csv | BBB | 00:02:04 | 1 | 2 | 3
/tmp/hourly01.csv | AAA | 01:01:02 | 1 | 2 | 3
/tmp/hourly01.csv | BBB | 01:02:04 | 1 | 2 | 3
/tmp/hourly02.csv | AAA | 02:01:02 | 1 | 2 | 3
/tmp/hourly02.csv | BBB | 02:02:04 | 1 | 2 | 3
This is another way Vertica can help process data on load and avoid multiple staging steps. Let us know how it works for you!