COPY many flat files to one table and save the file name to a column in the table?

I use COPY to load many flat files to one table. I want to save the file name to one column of the table in order to verify the data later. Below is an example of my COPY command. But I do not know how to save the file name to one column of the table in Vertica. Anybody has any ideas? thanks.


\set t_pwd 'pwd'
\set input_file '\'':t_pwd'/flat/*Payer.txt\''

truncate table _dicPayer;

copy _dicPayer
(
ID,
PAYERID,
PAYERNAME,
INSERTDATETIME
)
from  local  :input_file
escape as E'\001' delimiter E'|' skip 0 exceptions '/tmp/pb_ex' rejected data '/tmp/pb_rej'  direct;

Comments

  • Navin_CNavin_C Vertica Customer
    Hello,

    You can do this using a shell script
    #!/bin/bash
    INPUT_FILE="test_input_file.txt"
    vsql -c "copy test_file_name_insert
    (
    input_file_name as '$INPUT_FILE',
    id,
    name
    )
    from '/home/test_input_file.txt'
    delimiter '|' ;"
    Hope this helps

  • Thanks, Navin !!
    But the result is not what I need. I include * in the input file name as following example in order to import a lot of files once.
    \set t_pwd 'pwd'
    \set input_file '\'':t_pwd'/flat/aetname/*_MED_FILE.TXT\''

    The result is like below. The symbol * is NOT converted the real file name.
    /data/vertica/clinical/covenant/flat/aetname/*_MED_FILE.TXT

    If we are going to loop the files on OS level, we probably save the real file name into the table. Do you know how to loop the file in Linux or you have another easy idea?
    thanks.

  • Hi Wayne,

    Actually, there's another way to do this:  Vertica provides a built-in function "current_load_source()" that you can use as part of a COPY expression to extract the current filename:
    dbadmin=> create table t (i int, filename varchar);
    CREATE TABLE
    dbadmin=> copy t (i, filename as current_load_source()) from '/tmp/*.dat';
     Rows Loaded
    -------------
               3
    (1 row)

    dbadmin=> select * from t;
     i | filename
    ---+----------
     3 | c.dat
     2 | b.dat
     1 | a.dat
    (3 rows)
    Let us know if that helps.

    Alternatively, if you do want to iterate over files as part of a shell script, you could do something like:
    #!/bin/bash
    for INPUT_FILE in /path/to/file*.dat; do
    vsql -c "copy test_file_name_insert
    (
    input_file_name as '$INPUT_FILE',
    id,
    name
    )
    from '/home/test_input_file.txt'
    delimiter '|' ;"
    done
    However, this will cause Vertica to only see one file to copy at a time.  If you give us multiple files at once, we will parse them in parallel, resulting in improved performance.

    Adam
  • Navin_CNavin_C Vertica Customer
    what better way to do it  other then "current_load_source()"  :)

    Thanks Adam, This is something new I got to learn
  • There is no such function in Vertica 7.1.x documentation:
    Search returned this "Your search for "current_load_source" returned 0 result(s)."

  • Hi Oleg!

    I did a test:
    vsql=> select version();
    version
    ------------------------------------
    Vertica Analytic Database v7.1.1-2
    (1 row)

    CREATE TABLE IF NOT EXISTS LoadTest (
    rowid int,
    value varchar,
    source varchar
    );

    vsql=> \! cat /tmp/data01.txt
    1|foo

    vsql=> copy LoadTest(rowid, value, source as current_load_source()) from '/tmp/data01.txt' direct abort on error;

    vsql=> select * from LoadTest ;
    rowid | value | source
    -------+-------+------------
    1 | foo | data01.txt
    (1 row)

    daniel=> select * from LoadTest ;
    rowid | value | source
    -------+-------+-------------
    1 | foo | data01.txt
    1 | foo | Batch No. 1
    (2 rows)

    Regards.
  • Cool, so it is missing in documentation then.
    Good to know!
  • Hi!

    Be aware it require a FROM clause, without FROM clause function will return <unknown>.
  • Of course
  • Does not work for me:
    nvedwm1=> select version();
                  version
    ------------------------------------
     Vertica Analytic Database v7.1.1-0

    COPY cp_test (reconciliation_file_name AS current_load_source(), batch_id AS 123 ,file_name, records, net_pay_total, date_sent, total_table_count) FROM LOCAL '/landing/cp_test.txt' DELIMITER '|' TRAILING NULLCOLS EXCEPTIONS 'e.txt' REJECTED DATA 'r.txt'  ABORT ON ERROR;
     Rows Loaded
    -------------
              35

    select reconciliation_file_name from cp_test;
    --------------------------
     Batch No. 1
     Batch No. 1
     Batch No. 1
    ...






  • Hi!

    LOCAL :(
    LOCAL it is a streaming, i.e. its similar to
    cat data.csv | vsql -c "copy ... from stdin ..."
    Example:
    vsql=> copy LoadTest(rowid, value, source as current_load_source()) from local '/tmp/data01.txt' direct abort on error;
    Rows Loaded
    -------------
    1
    (1 row)

    vsql=> select * from LoadTest ;
    rowid | value | source
    -------+-------+-------------
    1 | foo | data01.txt
    1 | foo | Batch No. 1
    1 | foo | Batch No. 1
    (3 rows)

  • I am using LOCAL, I just edited COPY statement before posting it here, to remove some restricted names, etc, and accidentally removed LOCAL from my example above.

    That's not the issue, LOCAL, in here.
    My version is different, so it seems it was not supported there?
    I think some issues are with doc (as this function can be found in doc for 7.0) but not in 7.1.*
    And as in my example, it does not work in my version either
  • Hi!

    What is your environment?  Everything is works.
    daniel@synapse$ sudo dpkg -i /hdd/Vertica/DEB/vertica_7.1.1-0_amd64.deb
    Selecting previously unselected package vertica.
    (Reading database ... 302930 files and directories currently installed.)
    Preparing to unpack .../DEB/vertica_7.1.1-0_amd64.deb ...
    Unpacking vertica (7.1.1-0) ...
    ...

    daniel@synapse$ sudo /opt/vertica/sbin/install_vertica --host '127.0.0.1' --dba-user daniel --dba-group daniel --data-dir /vertica --license CE --accept-eula --failure-threshold HALT --clean


    daniel@synapse$ /opt/vertica/bin/vertica -V
    Vertica Analytic Database v7.1.1-0
    vertica(v7.1.1-0) built by release@build2.verticacorp.com from releases/VER_7_1_RELEASE_BUILD_1_0_20141016@148158 on 'Thu Oct 16 15:55:09 America/New_York 2014' $BuildId$

    # adminTools - create db
    *** Creating database: dev ***
    127.0.0.1 OK [vertica][(7, 1, 1)][000][all]
    ...

    daniel=> CREATE TABLE IF NOT EXISTS LoadTest (
    daniel(> rowid int,
    daniel(> value varchar,
    daniel(> source varchar
    daniel(> );

    daniel=> select * from LoadTest ;
    rowid | value | source
    -------+-------+----------
    1 | foo | data.txt
    (1 row)

  • Hi!

    What is your environment?  Everything is works.
    daniel@synapse$ sudo dpkg -i /hdd/Vertica/DEB/vertica_7.1.1-0_amd64.deb
    Selecting previously unselected package vertica.
    (Reading database ... 302930 files and directories currently installed.)
    Preparing to unpack .../DEB/vertica_7.1.1-0_amd64.deb ...
    Unpacking vertica (7.1.1-0) ...
    ...

    daniel@synapse$ sudo /opt/vertica/sbin/install_vertica --host '127.0.0.1' --dba-user daniel --dba-group daniel --data-dir /vertica --license CE --accept-eula --failure-threshold HALT --clean


    daniel@synapse$ /opt/vertica/bin/vertica -V
    Vertica Analytic Database v7.1.1-0
    vertica(v7.1.1-0) built by release@build2.verticacorp.com from releases/VER_7_1_RELEASE_BUILD_1_0_20141016@148158 on 'Thu Oct 16 15:55:09 America/New_York 2014' $BuildId$

    # adminTools - create db
    *** Creating database: dev ***
    127.0.0.1 OK [vertica][(7, 1, 1)][000][all]
    ...

    daniel=> CREATE TABLE IF NOT EXISTS LoadTest (
    daniel(> rowid int,
    daniel(> value varchar,
    daniel(> source varchar
    daniel(> );

    daniel=> select * from LoadTest ;
    rowid | value | source
    -------+-------+----------
    1 | foo | data.txt
    (1 row)

  • As you can see a version is

    daniel@synapse:~$ /opt/vertica/bin/vertica -V
    Vertica Analytic Database v7.1.1-0
    vertica(v7.1.1-0) built by release@build2.verticacorp.com from releases/VER_7_1_RELEASE_BUILD_1_0_20141016@148158 on 'Thu Oct 16 15:55:09 America/New_York 2014' $BuildId$

    Sorry missed a command:

    daniel@synapse:~$ cat /tmp/data.txt
    1|foo

    daniel=> copy LoadTest(rowid, value, source as current_load_source()) from '/tmp/data.txt' direct abort on error;
     Rows Loaded
    -------------
               1
    (1 row)

    daniel=> select * from LoadTest ;
     rowid | value |  source 
    -------+-------+----------
         1 | foo   | data.txt
    (1 row)



  • What I have:
    nvedwm1=> \! cat /etc/*-release
    3Q2014a
    LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
    Red Hat Enterprise Linux Server release 6.5 (Santiago)
    Red Hat Enterprise Linux Server release 6.5 (Santiago)

    Here is exact the same sample as you did:
    nvedwm1=> CREATE TABLE IF NOT EXISTS LoadTest (
    nvedwm1(>     rowid int,
    nvedwm1(>     value varchar,
    nvedwm1(>     source varchar
    nvedwm1(> );
    CREATE TABLE
    nvedwm1=> select * from LoadTest ;
     rowid | value | source
    -------+-------+--------
    (0 rows)

    nvedwm1=> \! vi data01.txt
    nvedwm1=>
    nvedwm1=> \! cat data01.txt
    1|foo

    nvedwm1=> copy LoadTest(rowid, value, source as current_load_source()) from local 'data01.txt' direct abort on error;
     Rows Loaded
    -------------
               1
    (1 row)

    nvedwm1=> select * from LoadTest;
     rowid | value |   source
    -------+-------+-------------
         1 | foo   | Batch No. 1
    (1 row)







  • Hi!

    Don't know what to say, some weird behavior. I tested on Ubuntu 14.04 LTS.
    I have no RHEL subscription, I will try to test on CentOS
    (херня какая-то, я ожидал наоборот, разработка то под RHEL, а deb'ы они alian'ом создают, позор)

    PS
    As I know you HP employee, try to open a ticket with all details.

    working env:
    daniel@synapse:~$ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description: Ubuntu 14.04.1 LTS
    Release: 14.04
    Codename: trusty

    daniel@synapse:~$ /opt/vertica/bin/vertica -V
    Vertica Analytic Database v7.1.1-0
    vertica(v7.1.1-0) built by release@build2.verticacorp.com from releases/VER_7_1_RELEASE_BUILD_1_0_20141016@148158 on 'Thu Oct 16 15:55:09 America/New_York 2014' $BuildId$
  • Yeah, will see. I can work around as anyway running all from Python. But that's weirdo.
    Will check with some people @hp
    ну да, херня получчается. впрочем, не впервой раз с Вертикой такое у меня
  • Hi!

    Can you repeat on test without LOCAL?
    nvedwm1=> copy LoadTest(rowid, value, source as current_load_source()) from local 'data01.txt' direct abort on error;
    With CentOS  everything is ok.

    PS
    I will try to download RHEL evaluation.
  • I am using vsql client, not running COPY directly on Vertica node, so cannot use COPY w/o LOCAL actually.
  • COPY LOCAL always will give you <Batch No. #>, even if it will be on Vertica node.
    It works only with COPY(without LOCAL) only.

    Regards.
  • Not sure, there is JIRA opened re: this issue today, with Vertica.
    Will see what they respond.
    Also if I remember correctly (and will go through my scripts to find exact names), Vertica 6.* supported reserved "dummy" columns in COPY, one of  which were providing exactly source file name.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file