We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Storing source file names when loading data — Vertica Forum

Storing source file names when loading data

Bryan_HBryan_H Vertica Employee Administrator

You can manipulate fields when loading data with the COPY command, but did you know you can also store the source file name? Let's see how:

Suppose I have hourly data files in the pattern hourlyXX.csv where XX is the hour from 00-23. I want to load them into a single staging table.

Let's define the staging table:

dbadmin=> create table hourly (filename VARCHAR, symbol VARCHAR, tm TIME, q1 int, q2 int, q3 int);

COPY data with "filename" set to the source file name or CURRENT_LOAD_SOURCE() and use a wildcard to match all "hourly" files:

dbadmin=> COPY hourly (filename AS CURRENT_LOAD_SOURCE(), symbol, tm, q1, q2, q3) FROM '/tmp/hourly*.csv';

Now let's see what we loaded:

dbadmin=> select * from hourly ;
filename | symbol | tm | q1 | q2 | q3
-------------------+--------+----------+----+----+----
/tmp/hourly00.csv | AAA | 00:01:02 | 1 | 2 | 3
/tmp/hourly00.csv | BBB | 00:02:04 | 1 | 2 | 3
/tmp/hourly01.csv | AAA | 01:01:02 | 1 | 2 | 3
/tmp/hourly01.csv | BBB | 01:02:04 | 1 | 2 | 3
/tmp/hourly02.csv | AAA | 02:01:02 | 1 | 2 | 3
/tmp/hourly02.csv | BBB | 02:02:04 | 1 | 2 | 3

This is another way Vertica can help process data on load and avoid multiple staging steps. Let us know how it works for you!

Reference:
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/SQLReferenceManual/Functions/CurrentLoad/CURRENT_LOAD_SOURCE.htm

Tagged:
Sign In or Register to comment.