copy from local, autogenerate line number

alexisos · February 2021

Is there any possibility to autogenerate (not with increment at db level) a line number during the import of a local file?
create table TEST_RN_COPY (rn integer, c1 varchar(10), c2 varchar(10));
copy dwh.TEST_RN_COPY (c1,c2) from local 'delimited.txt' delimiter ',';
How to filll the RN??
The information exists sowewhere in Vertica, see the next "error line" from the exceptions file
COPY: Input record 2 has been rejected (Invalid integer format 'row2' for column 1 (rn)).
Please see , record 2 for the rejected record. This record was record 2 from wq\delimited.txt

Regards,
Alex

Jim_Knicely · February 2021

Then don't use STDIN

dbadmin=> TRUNCATE TABLE alpha;
TRUNCATE TABLE

dbadmin=> \! cat -n /home/dbadmin/delimited.txt | column -t -o '|' > /home/dbadmin/delimited_with_line_numbers.txt

dbadmin=> COPY alpha FROM '/home/dbadmin/delimited_with_line_numbers.txt';
 Rows Loaded
-------------
          26
(1 row)

dbadmin=> SELECT * FROM alpha;
 c1 | c2
----+----
  1 | A
  2 | B
  3 | C
  4 | D
  5 | E
  6 | F
  7 | G
  8 | H
  9 | I
 10 | J
 11 | K
 12 | L
 13 | M
 14 | N
 15 | O
 16 | P
 17 | Q
 18 | R
 19 | S
 20 | T
 21 | U
 22 | V
 23 | W
 24 | X
 25 | Y
 26 | Z
(26 rows)

alexisos · February 2021

Sorry @Jim_Knicely,
the "delimited with line numbers" is very slow and changes completly the output file, splitting the whole input file in columns

dbadmin=> ! cat -n /home/dbadmin/delimited.txt | column -t -o '|' > /home/dbadmin/delimited_with_line_numbers.txt

dbadmin=> COPY alpha FROM '/home/dbadmin/delimited_with_line_numbers.txt';

But you give me an ideea
https://unix.stackexchange.com/questions/222218/how-to-add-line-numbers-in-every-line-using-shell-command
I can use
awk '{print NR "|" $s}' input_file > output_file
The last line of the output:
61369814|61120000000000000000810000008100000081000000810G 0000000000000A
Regards,
Alex

marcothesane · February 2021

There is no way to add an expression of the type column_n AS <some_kind_of_row_sequence to the COPY command.
A possible workaround was posted some 4 years ago - I don't know whether that can help you ...
https://forum.vertica.com/discussion/237457/add-sequence-numbers-to-a-bulk-load-in-vertica

Ariel_Cary · February 2021

Hi Alex. You could write your own parser and populate the rn column with any sequence you like. See the UD parser documentation for that.
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/ExtendingVertica/UDx/UDL/UDLParser.htm
(If you want to be strict and match the first record with sequence=1, the second with 2, etc. you'd need to prevent parallel loads. But will definitely hinder performance at scale.)
Also, the record number you see in Vertica logs is generally relative to the block of data being parsed. For some parsers, the block of data is the file and you will get an absolute row number (csv parser is an example of that). But other parsers chunk the input file and run parallel loads. In that case, you don't have an absolute row number. But the error message will tell you what went wrong parsing the specific record, like data type coercion or trying to load null value in a non-null column, etc.
I am curious to know more details about your use case. Can you elaborate more on what you are trying to accomplish? Why an identity column does not satisfy your needs?

Jim_Knicely · February 2021

Maybe like this?

dbadmin=> CREATE TABLE alpha (c1 INT, c2 VARCHAR);
CREATE TABLE

dbadmin=> \! cat /home/dbadmin/delimited.txt
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

dbadmin=> \! cat -n delimited.txt | column -t -o '|' | vsql -c "COPY alpha FROM STDIN;"
dbadmin=> SELECT * FROM alpha ORDER BY c1;
 c1 | c2
----+----
  1 | A
  2 | B
  3 | C
  4 | D
  5 | E
  6 | F
  7 | G
  8 | H
  9 | I
 10 | J
 11 | K
 12 | L
 13 | M
 14 | N
 15 | O
 16 | P
 17 | Q
 18 | R
 19 | S
 20 | T
 21 | U
 22 | V
 23 | W
 24 | X
 25 | Y
 26 | Z
(26 rows)

Ariel_Cary · February 2021

@Jim_Knicely That'd be another way to prevent parallel loads. COPY FROM STDIN is single threaded. Fine for small data. I'm still curious about the fundamental use case.

alexisos · February 2021

Hi @Jim_Knicely ,
using cat to add a new colum to the file seems to be the best sollution. We will load some "big" text files, which will be further parsed to some relational format. The "original" line number is needed for debugging.
Regards
Alex

alexisos · February 2021

Hi @Jim_Knicely ,
I think that the sollution with delimited_with_line_numbers.txt is the best one.
Use case: we have to load some "big" text files. These text files will be afterwards parsed to some relational format. In order to identify possible errors like "Numeric expected in file "some file name" at line XXXX at positions 120 .. 122 but spaces found" we need that line number.
Regards
Alex

alexisos · February 2021

Hi @Jim_Knicely ,
I think that the sollution with delimited_with_line_numbers.txt is the best one.
Use case: we have to load some "big" text files. These text files will be afterwards parsed to some relational format. In order to identify possible errors like "Numeric expected in file "some file name" at line XXXX at positions 120 .. 122 but spaces found" we need that line number.
Regard
Alex

We're Moving!

Create My New Community Account Now

copy from local, autogenerate line number

Best Answers

Answers

Leave a Comment