Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Storing source file names when loading data

You can manipulate fields when loading data with the COPY command, but did you know you can also store the source file name? Let's see how:

Suppose I have hourly data files in the pattern hourlyXX.csv where XX is the hour from 00-23. I want to load them into a single staging table.

Let's define the staging table:

dbadmin=> create table hourly (filename VARCHAR, symbol VARCHAR, tm TIME, q1 int, q2 int, q3 int);

COPY data with "filename" set to the source file name or CURRENT_LOAD_SOURCE() and use a wildcard to match all "hourly" files:

dbadmin=> COPY hourly (filename AS CURRENT_LOAD_SOURCE(), symbol, tm, q1, q2, q3) FROM '/tmp/hourly*.csv';

Now let's see what we loaded:

dbadmin=> select * from hourly ;
filename | symbol | tm | q1 | q2 | q3
-------------------+--------+----------+----+----+----
/tmp/hourly00.csv | AAA | 00:01:02 | 1 | 2 | 3
/tmp/hourly00.csv | BBB | 00:02:04 | 1 | 2 | 3
/tmp/hourly01.csv | AAA | 01:01:02 | 1 | 2 | 3
/tmp/hourly01.csv | BBB | 01:02:04 | 1 | 2 | 3
/tmp/hourly02.csv | AAA | 02:01:02 | 1 | 2 | 3
/tmp/hourly02.csv | BBB | 02:02:04 | 1 | 2 | 3

This is another way Vertica can help process data on load and avoid multiple staging steps. Let us know how it works for you!

Reference:
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/SQLReferenceManual/Functions/CurrentLoad/CURRENT_LOAD_SOURCE.htm

Tagged:
Sign In or Register to comment.

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.