get_num_rejected_rows() return value x 2

edited November 2017 in General Discussion

Hi,

I try to get number of rejected lines but the function get_num_rejected_rows() give me a result x 2.

Here is my dockerfile on which I run my vertica 9 environment:
https://github.com/ypiel-talend/docker-vertica/blob/ypiel/docker-vertica9/Dockerfile

You have to get the vertica 9 .deb installer for ubuntu/debian
then rename it as vertica9.deb
and then run docker command : docker build -t vertica9 .
all files must be in the same folder : Dockerfile and vertica9.deb
Then run the container: docker run -p 5433:5433 docker9

Here is what I do:

    --------------------------
    CREATE TABLE IF NOT EXISTS Public.customers (
      id integer PRIMARY KEY NOT NULL,
      name varchar(10),
      age integer,
      comment varchar(10)
    );

    COPY Public.customers FROM LOCAL 'C:/temp/data.txt' 
                  STREAM NAME 'talend' DELIMITER ';' 
                  TRAILING NULLCOLS SKIP 1 RECORD TERMINATOR E'\n' 
                  ENCLOSED BY E'"' ESCAPE AS '\' ENFORCELENGTH 
                  REJECTMAX 1000 NULL 'none' 
                  DIRECT REJECTED DATA 'C:/temp/Vertica_rejected.csv' EXCEPTIONS 'C:/temp/Vertica_exception.log' NO COMMIT

    select get_num_rejected_rows()
    --------------------------

In attachment the data.txt file with 3 wrong ID (5, 10, 15) that are rejected.
The result of get_num_rejected_rows() is 6.

Can you explain me why this value is multiplied by 2 ? Do I miss something ?

Thanks for help.

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    Seems to work for me, although I am running the following on my single node cluster...

    dbadmin=> select version();
                  version
    ------------------------------------
     Vertica Analytic Database v9.0.0-1
    (1 row)
    
    dbadmin=> COPY Public.customers FROM  '/home/dbadmin/data.txt' STREAM NAME 'talend' DELIMITER ';' REJECTMAX 1000 NULL 'none' REJECTED DATA '/home/dbadmin/customers.rej' exceptions '/home/dbadmin/customers.exp' DIRECT;
    NOTICE 7850:  In a multi-threaded load, rejected record data may be written to additional files
    HINT:  Exceptions may be written to files [/home/dbadmin/customers.exp], [/home/dbadmin/customers.exp.1], etc
     Rows Loaded
    -------------
              97
    (1 row)
    
    dbadmin=> select get_num_rejected_rows();
     get_num_rejected_rows
    -----------------------
                         3
    (1 row)
    

    What does the get_num_accepted_rows() function show you? 97 or 194?

    dbadmin=> select get_num_accepted_rows();
     get_num_accepted_rows
    -----------------------
                        97
    (1 row)
    
  • Hi Jim,

    Thank you for your feedback.

    select get_num_accepted_rows(); : 96
    select get_num_rejected_rows(); : 6 which should be 3

    There 100 lines into the file. The first one is skipped by 'SKIP 1' and lines 5, 10 and 15 have 'X' in the id that's why they are rejected.
    100 - 1 - 3 = 96 which is the valid accepted num rows

    Did you test with the docker I have sent you ?

    Could you test with my query please ? Yours is not exactly the same...
    Did you try with LOCAL option ? COPY table FROM LOCAL

    Here are the test I have done:

    • I remove all 'X' char in ID column so 0 reject : it's ok
    • returned values are ok 99 for accepted, 0 for rejected

    • I set one 'X' on a ID : not ok
      -- it returns 98 for accepted but 2 for rejected
      -- In rejected file I have one line :
      x5;Jimmy;33;3iQovN

    • In exception file I have one exception :
      COPY: Input record 4 has been rejected (Invalid integer format 'x5' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 1 for the rejected record. This record was read from C:\temp\data.csv

    • I set 'X' on ID of two lines : not ok

    • it returns 97 for accepted but 4 for rejected
    • Rejected file contains two lines :
      x5;Jimmy;33;3iQovN
      x10;Jimmy;69;WwzOFl

    • In exception file I have two exceptions :
      COPY: Input record 4 has been rejected (Invalid integer format 'x5' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 1 for the rejected record. This record was read from C:\temp\data.csv
      COPY: Input record 9 has been rejected (Invalid integer format 'x10' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 2 for the rejected record. This record was read from C:\temp\data.csv

    Best regards

  • edited November 2017

    Hi Jim,

    Thank you for your feedback.

    Did you try on the docker I have sent you ?
    Your query is not exactly the same the one I have sent you too...

    Here is what I have tested :

    • I remove all 'X' from the Id column so my file shouldn't have any reject : OK
      ** get_num_accepted_rows : 99 :ok
      ** get_num_rejected_rows : 0 : ok
      ** Rejected file : empty : ok
      ** Exception file : empty : ok

    • I set a 'X' on the ID of the 5th line : KO
      ** get_num_accepted_rows : 98 : ok
      ** get_num_rejected_rows : 2 : KO
      ** Rejected file : The rejected line : ok
      *** x5;Jimmy;33;3iQovN
      ** Exception file : 1 exception : ok
      *** COPY: Input record 4 has been rejected (Invalid integer format 'x5' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 1 for the rejected record. This record was read from C:\temp\data.csv

    • I set a 'X' on the ID of the 10th line : KO
      ** get_num_accepted_rows : 97 : ok
      ** get_num_rejected_rows : 4 : KO
      ** Rejected file : The rejected line : ok
      *** x5;Jimmy;33;3iQovN
      *** x10;Jimmy;69;WwzOFl
      ** Exception file : 1 exception : ok
      *** COPY: Input record 4 has been rejected (Invalid integer format 'x5' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 1 for the rejected record. This record was read from C:\temp\data.csv
      *** COPY: Input record 9 has been rejected (Invalid integer format 'x10' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 2 for the rejected record. This record was read from C:\temp\data.csv

    And so on... The num rejected row is always twice the real number of rejected rows.

  • I tried to minimize the number of options and I have the same result with this query:

    COPY Public.customers FROM LOCAL 'C:/temp/data_unix.csv' 
                  REJECTED DATA 'C:/temp/Vertica_rejected.csv'
                  EXCEPTIONS 'C:/temp/Vertica_exception.log'
    
  • edited December 2017

    Here is my last test:
    I do the same query as you :
    COPY Public.customers FROM '/tmp/shared/data.csv' STREAM NAME 'talend' DELIMITER ';' REJECTMAX 1000 NULL 'none' REJECTED DATA '/tmp/Vertica_rejected.csv' exceptions '/tmp/Vertica_exception.log' DIRECT;
    I have copied the data.csv file in a shared folder of the container and without 'LOCAL' option it works.

    The only difference is that I use COPY mytable FROM LOCAL '/my/file' and you do COPY mytable FROM '/my/file'.

    Could you have a test with a COPY ... FROM LOCAL ... ?

    Best regards.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017

    Hi,

    You are right! The GET_NUM_REJECTED_ROWS function does return double the actual after using a COPY LOCAL!

    dbadmin=> COPY Public.customers FROM '/home/dbadmin/data.txt' STREAM NAME 'talend' DELIMITER ';' REJECTMAX 1000 NULL 'none' REJECTED DATA '/home/dbadmin/customers.rej' exceptions '/home/dbadmin/customers.exp' DIRECT;
    NOTICE 7850:  In a multi-threaded load, rejected record data may be written to additional files
    HINT:  Exceptions may be written to files [/home/dbadmin/customers.exp], [/home/dbadmin/customers.exp.1], etc
     Rows Loaded
    -------------
              97
    (1 row)
    
    dbadmin=> select get_num_rejected_rows();
     get_num_rejected_rows
    -----------------------
                         3
    (1 row)
    
    dbadmin=> COPY Public.customers FROM LOCAL '/home/dbadmin/data.txt' STREAM NAME 'talend' DELIMITER ';' REJECTMAX 1000 NULL 'none' REJECTED DATA '/home/dbadmin/customers.rej' exceptions '/home/dbadmin/customers.exp' DIRECT;
     Rows Loaded
    -------------
              97
    (1 row)
    
    dbadmin=> select get_num_rejected_rows();
     get_num_rejected_rows
    -----------------------
                         6
    (1 row)
    

    I will open a JIRA bug ticket and follow up with you. Thanks for pointing this out!

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017

    @ypiel - Fyi ...I found out that this issue has been fixed and is targeted to be included in Vertica 9.0SP1.

  • Great, thanks a lot.

  • Has this ever been fixed? I am experiencing the same issue running on Vertica 9.2.0 .

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited September 2019

    Can you post an example?

    This simple test seems to work as expected:

    dbadmin=> SELECT version();
                  version
    ------------------------------------
     Vertica Analytic Database v9.2.1-6
    (1 row)
    
    
    dbadmin=> CREATE TABLE another_test (c INT);
    CREATE TABLE
    
    dbadmin=> \! cat /home/dbadmin/another_test.txt
    1
    2
    three
    4
    five
    six
    7
    
    dbadmin=> COPY another_test FROM LOCAL '/home/dbadmin/another_test.txt';
     Rows Loaded
    -------------
               4
    (1 row)
    
    dbadmin=> select get_num_rejected_rows();
     get_num_rejected_rows
    -----------------------
                         3
    (1 row)
    
    dbadmin=> COPY another_test FROM LOCAL '/home/dbadmin/another_test.txt' REJECTED DATA '/home/dbadmin/another_example.rej' EXCEPTIONS '/home/dbadmin/another_example.exp' DIRECT;
     Rows Loaded
    -------------
               4
    (1 row)
    
    dbadmin=> SELECT get_num_rejected_rows();
     get_num_rejected_rows
    -----------------------
                         3
    (1 row)
    
    dbadmin=> \! cat /home/dbadmin/another_example.rej
    three
    five
    six
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    I even re-tested the original poster's data and table. It works okay now in 9.2.

    dbadmin=> CREATE TABLE IF NOT EXISTS Public.customers (
    dbadmin(>       id integer PRIMARY KEY NOT NULL,
    dbadmin(>       name varchar(10),
    dbadmin(>       age integer,
    dbadmin(>       comment varchar(10)
    dbadmin(>     );
    CREATE TABLE
    
    dbadmin=> COPY Public.customers FROM LOCAL '/home/dbadmin/data.txt' STREAM NAME 'talend' DELIMITER ';' REJECTMAX 1000 NULL 'none' REJECTED DATA '/home/dbadmin/customers.rej' exceptions '/home/dbadmin/customers.exp' DIRECT;
     Rows Loaded
    -------------
              97
    (1 row)
    
    dbadmin=> select get_num_rejected_rows();
     get_num_rejected_rows
    -----------------------
                         3
    (1 row)
    
    dbadmin=> \! cat /home/dbadmin/customers.rej
    X5;Jimmy;33;3iQovN
    X10;Jimmy;69;WwzOFl
    X15;Andrew;67;0sebB3
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file