COPY many flat files to one table and save the file name to a column in the table?
I use COPY to load many flat files to one table. I want to save the file name to one column of the table in order to verify the data later. Below is an example of my COPY command. But I do not know how to save the file name to one column of the table in Vertica. Anybody has any ideas? thanks.
\set t_pwd 'pwd'
\set input_file '\'':t_pwd'/flat/*Payer.txt\''
truncate table _dicPayer;
copy _dicPayer
(
ID,
PAYERID,
PAYERNAME,
INSERTDATETIME
)
from local :input_file
escape as E'\001' delimiter E'|' skip 0 exceptions '/tmp/pb_ex' rejected data '/tmp/pb_rej' direct;
\set t_pwd 'pwd'
\set input_file '\'':t_pwd'/flat/*Payer.txt\''
truncate table _dicPayer;
copy _dicPayer
(
ID,
PAYERID,
PAYERNAME,
INSERTDATETIME
)
from local :input_file
escape as E'\001' delimiter E'|' skip 0 exceptions '/tmp/pb_ex' rejected data '/tmp/pb_rej' direct;
0
Comments
You can do this using a shell script Hope this helps
But the result is not what I need. I include * in the input file name as following example in order to import a lot of files once.
\set t_pwd 'pwd'
\set input_file '\'':t_pwd'/flat/aetname/*_MED_FILE.TXT\''
The result is like below. The symbol * is NOT converted the real file name.
/data/vertica/clinical/covenant/flat/aetname/*_MED_FILE.TXT
If we are going to loop the files on OS level, we probably save the real file name into the table. Do you know how to loop the file in Linux or you have another easy idea?
thanks.
Actually, there's another way to do this: Vertica provides a built-in function "current_load_source()" that you can use as part of a COPY expression to extract the current filename: Let us know if that helps.
Alternatively, if you do want to iterate over files as part of a shell script, you could do something like: However, this will cause Vertica to only see one file to copy at a time. If you give us multiple files at once, we will parse them in parallel, resulting in improved performance.
Adam
Thanks Adam, This is something new I got to learn
Search returned this "Your search for "current_load_source" returned 0 result(s)."
I did a test: Regards.
Good to know!
Be aware it require a FROM clause, without FROM clause function will return <unknown>.
nvedwm1=> select version();
version
------------------------------------
Vertica Analytic Database v7.1.1-0
COPY cp_test (reconciliation_file_name AS current_load_source(), batch_id AS 123 ,file_name, records, net_pay_total, date_sent, total_table_count) FROM LOCAL '/landing/cp_test.txt' DELIMITER '|' TRAILING NULLCOLS EXCEPTIONS 'e.txt' REJECTED DATA 'r.txt' ABORT ON ERROR;
Rows Loaded
-------------
35
select reconciliation_file_name from cp_test;
--------------------------
Batch No. 1
Batch No. 1
Batch No. 1
...
LOCAL
LOCAL it is a streaming, i.e. its similar to Example:
That's not the issue, LOCAL, in here.
My version is different, so it seems it was not supported there?
I think some issues are with doc (as this function can be found in doc for 7.0) but not in 7.1.*
And as in my example, it does not work in my version either
What is your environment? Everything is works.
What is your environment? Everything is works.
daniel@synapse:~$ /opt/vertica/bin/vertica -V
Vertica Analytic Database v7.1.1-0
vertica(v7.1.1-0) built by release@build2.verticacorp.com from releases/VER_7_1_RELEASE_BUILD_1_0_20141016@148158 on 'Thu Oct 16 15:55:09 America/New_York 2014' $BuildId$
Sorry missed a command:
daniel@synapse:~$ cat /tmp/data.txt
1|foo
daniel=> copy LoadTest(rowid, value, source as current_load_source()) from '/tmp/data.txt' direct abort on error;
Rows Loaded
-------------
1
(1 row)
daniel=> select * from LoadTest ;
rowid | value | source
-------+-------+----------
1 | foo | data.txt
(1 row)
nvedwm1=> \! cat /etc/*-release
3Q2014a
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Red Hat Enterprise Linux Server release 6.5 (Santiago)
Red Hat Enterprise Linux Server release 6.5 (Santiago)
Here is exact the same sample as you did:
nvedwm1=> CREATE TABLE IF NOT EXISTS LoadTest (
nvedwm1(> rowid int,
nvedwm1(> value varchar,
nvedwm1(> source varchar
nvedwm1(> );
CREATE TABLE
nvedwm1=> select * from LoadTest ;
rowid | value | source
-------+-------+--------
(0 rows)
nvedwm1=> \! vi data01.txt
nvedwm1=>
nvedwm1=> \! cat data01.txt
1|foo
nvedwm1=> copy LoadTest(rowid, value, source as current_load_source()) from local 'data01.txt' direct abort on error;
Rows Loaded
-------------
1
(1 row)
nvedwm1=> select * from LoadTest;
rowid | value | source
-------+-------+-------------
1 | foo | Batch No. 1
(1 row)
Don't know what to say, some weird behavior. I tested on Ubuntu 14.04 LTS.
I have no RHEL subscription, I will try to test on CentOS
(херня какая-то, я ожидал наоборот, разработка то под RHEL, а deb'ы они alian'ом создают, позор)
PS
As I know you HP employee, try to open a ticket with all details.
working env:
Will check with some people @hp
ну да, херня получчается. впрочем, не впервой раз с Вертикой такое у меня
Can you repeat on test without LOCAL? With CentOS everything is ok.
PS
I will try to download RHEL evaluation.
It works only with COPY(without LOCAL) only.
Regards.
Will see what they respond.
Also if I remember correctly (and will go through my scripts to find exact names), Vertica 6.* supported reserved "dummy" columns in COPY, one of which were providing exactly source file name.