get_num_rejected_rows() return value x 2
Hi,
I try to get number of rejected lines but the function get_num_rejected_rows() give me a result x 2.
Here is my dockerfile on which I run my vertica 9 environment:
https://github.com/ypiel-talend/docker-vertica/blob/ypiel/docker-vertica9/Dockerfile
You have to get the vertica 9 .deb installer for ubuntu/debian
then rename it as vertica9.deb
and then run docker command : docker build -t vertica9 .
all files must be in the same folder : Dockerfile and vertica9.deb
Then run the container: docker run -p 5433:5433 docker9
Here is what I do:
-------------------------- CREATE TABLE IF NOT EXISTS Public.customers ( id integer PRIMARY KEY NOT NULL, name varchar(10), age integer, comment varchar(10) ); COPY Public.customers FROM LOCAL 'C:/temp/data.txt' STREAM NAME 'talend' DELIMITER ';' TRAILING NULLCOLS SKIP 1 RECORD TERMINATOR E'\n' ENCLOSED BY E'"' ESCAPE AS '\' ENFORCELENGTH REJECTMAX 1000 NULL 'none' DIRECT REJECTED DATA 'C:/temp/Vertica_rejected.csv' EXCEPTIONS 'C:/temp/Vertica_exception.log' NO COMMIT select get_num_rejected_rows() --------------------------
In attachment the data.txt file with 3 wrong ID (5, 10, 15) that are rejected.
The result of get_num_rejected_rows() is 6.
Can you explain me why this value is multiplied by 2 ? Do I miss something ?
Thanks for help.
Comments
Hi,
Seems to work for me, although I am running the following on my single node cluster...
What does the get_num_accepted_rows() function show you? 97 or 194?
Hi Jim,
Thank you for your feedback.
select get_num_accepted_rows(); : 96
select get_num_rejected_rows(); : 6 which should be 3
There 100 lines into the file. The first one is skipped by 'SKIP 1' and lines 5, 10 and 15 have 'X' in the id that's why they are rejected.
100 - 1 - 3 = 96 which is the valid accepted num rows
Did you test with the docker I have sent you ?
Could you test with my query please ? Yours is not exactly the same...
Did you try with LOCAL option ? COPY table FROM LOCAL
Here are the test I have done:
returned values are ok 99 for accepted, 0 for rejected
I set one 'X' on a ID : not ok
-- it returns 98 for accepted but 2 for rejected
-- In rejected file I have one line :
x5;Jimmy;33;3iQovN
In exception file I have one exception :
COPY: Input record 4 has been rejected (Invalid integer format 'x5' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 1 for the rejected record. This record was read from C:\temp\data.csv
I set 'X' on ID of two lines : not ok
Rejected file contains two lines :
x5;Jimmy;33;3iQovN
x10;Jimmy;69;WwzOFl
In exception file I have two exceptions :
COPY: Input record 4 has been rejected (Invalid integer format 'x5' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 1 for the rejected record. This record was read from C:\temp\data.csv
COPY: Input record 9 has been rejected (Invalid integer format 'x10' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 2 for the rejected record. This record was read from C:\temp\data.csv
Best regards
Hi Jim,
Thank you for your feedback.
Did you try on the docker I have sent you ?
Your query is not exactly the same the one I have sent you too...
Here is what I have tested :
I remove all 'X' from the Id column so my file shouldn't have any reject : OK
** get_num_accepted_rows : 99 :ok
** get_num_rejected_rows : 0 : ok
** Rejected file : empty : ok
** Exception file : empty : ok
I set a 'X' on the ID of the 5th line : KO
** get_num_accepted_rows : 98 : ok
** get_num_rejected_rows : 2 : KO
** Rejected file : The rejected line : ok
*** x5;Jimmy;33;3iQovN
** Exception file : 1 exception : ok
*** COPY: Input record 4 has been rejected (Invalid integer format 'x5' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 1 for the rejected record. This record was read from C:\temp\data.csv
I set a 'X' on the ID of the 10th line : KO
** get_num_accepted_rows : 97 : ok
** get_num_rejected_rows : 4 : KO
** Rejected file : The rejected line : ok
*** x5;Jimmy;33;3iQovN
*** x10;Jimmy;69;WwzOFl
** Exception file : 1 exception : ok
*** COPY: Input record 4 has been rejected (Invalid integer format 'x5' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 1 for the rejected record. This record was read from C:\temp\data.csv
*** COPY: Input record 9 has been rejected (Invalid integer format 'x10' for column 1 (id)). Please see C:/temp/Vertica_rejected.csv, record 2 for the rejected record. This record was read from C:\temp\data.csv
And so on... The num rejected row is always twice the real number of rejected rows.
I tried to minimize the number of options and I have the same result with this query:
Here is my last test:
I do the same query as you :
COPY Public.customers FROM '/tmp/shared/data.csv' STREAM NAME 'talend' DELIMITER ';' REJECTMAX 1000 NULL 'none' REJECTED DATA '/tmp/Vertica_rejected.csv' exceptions '/tmp/Vertica_exception.log' DIRECT;
I have copied the data.csv file in a shared folder of the container and without 'LOCAL' option it works.
The only difference is that I use
COPY mytable FROM LOCAL '/my/file'
and you doCOPY mytable FROM '/my/file'
.Could you have a test with a COPY ... FROM LOCAL ... ?
Best regards.
Hi,
You are right! The GET_NUM_REJECTED_ROWS function does return double the actual after using a COPY LOCAL!
I will open a JIRA bug ticket and follow up with you. Thanks for pointing this out!
@ypiel - Fyi ...I found out that this issue has been fixed and is targeted to be included in Vertica 9.0SP1.
Great, thanks a lot.
Has this ever been fixed? I am experiencing the same issue running on Vertica 9.2.0 .
Can you post an example?
This simple test seems to work as expected:
I even re-tested the original poster's data and table. It works okay now in 9.2.