copy from local, autogenerate line number
Is there any possibility to autogenerate (not with increment at db level) a line number during the import of a local file?
create table TEST_RN_COPY (rn integer, c1 varchar(10), c2 varchar(10));
copy dwh.TEST_RN_COPY (c1,c2) from local 'delimited.txt' delimiter ',';
How to filll the RN??
The information exists sowewhere in Vertica, see the next "error line" from the exceptions file
COPY: Input record 2 has been rejected (Invalid integer format 'row2' for column 1 (rn)).
Please see , record 2 for the rejected record. This record was record 2 from wq\delimited.txt
Regards,
Alex
Best Answers
-
Jim_Knicely Administrator
Then don't use STDIN
dbadmin=> TRUNCATE TABLE alpha; TRUNCATE TABLE dbadmin=> \! cat -n /home/dbadmin/delimited.txt | column -t -o '|' > /home/dbadmin/delimited_with_line_numbers.txt dbadmin=> COPY alpha FROM '/home/dbadmin/delimited_with_line_numbers.txt'; Rows Loaded ------------- 26 (1 row) dbadmin=> SELECT * FROM alpha; c1 | c2 ----+---- 1 | A 2 | B 3 | C 4 | D 5 | E 6 | F 7 | G 8 | H 9 | I 10 | J 11 | K 12 | L 13 | M 14 | N 15 | O 16 | P 17 | Q 18 | R 19 | S 20 | T 21 | U 22 | V 23 | W 24 | X 25 | Y 26 | Z (26 rows)
0 -
Sorry @Jim_Knicely,
the "delimited with line numbers" is very slow and changes completly the output file, splitting the whole input file in columnsdbadmin=> ! cat -n /home/dbadmin/delimited.txt | column -t -o '|' > /home/dbadmin/delimited_with_line_numbers.txt
dbadmin=> COPY alpha FROM '/home/dbadmin/delimited_with_line_numbers.txt';
But you give me an ideea
https://unix.stackexchange.com/questions/222218/how-to-add-line-numbers-in-every-line-using-shell-command
I can use
awk '{print NR "|" $s}' input_file > output_file
The last line of the output:
61369814|61120000000000000000810000008100000081000000810G 0000000000000A
Regards,
Alex0
Answers
There is no way to add an expression of the type
column_n AS <some_kind_of_row_sequence
to the COPY command.A possible workaround was posted some 4 years ago - I don't know whether that can help you ...
https://forum.vertica.com/discussion/237457/add-sequence-numbers-to-a-bulk-load-in-vertica
Hi Alex. You could write your own parser and populate the
rn
column with any sequence you like. See the UD parser documentation for that.https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/ExtendingVertica/UDx/UDL/UDLParser.htm
(If you want to be strict and match the first record with sequence=1, the second with 2, etc. you'd need to prevent parallel loads. But will definitely hinder performance at scale.)
Also, the record number you see in Vertica logs is generally relative to the block of data being parsed. For some parsers, the block of data is the file and you will get an absolute row number (csv parser is an example of that). But other parsers chunk the input file and run parallel loads. In that case, you don't have an absolute row number. But the error message will tell you what went wrong parsing the specific record, like data type coercion or trying to load null value in a non-null column, etc.
I am curious to know more details about your use case. Can you elaborate more on what you are trying to accomplish? Why an identity column does not satisfy your needs?
Maybe like this?
@Jim_Knicely That'd be another way to prevent parallel loads.
COPY FROM STDIN
is single threaded. Fine for small data. I'm still curious about the fundamental use case.Hi @Jim_Knicely ,
using cat to add a new colum to the file seems to be the best sollution. We will load some "big" text files, which will be further parsed to some relational format. The "original" line number is needed for debugging.
Regards
Alex
Hi @Jim_Knicely ,
I think that the sollution with delimited_with_line_numbers.txt is the best one.
Use case: we have to load some "big" text files. These text files will be afterwards parsed to some relational format. In order to identify possible errors like "Numeric expected in file "some file name" at line XXXX at positions 120 .. 122 but spaces found" we need that line number.
Regards
Alex
Hi @Jim_Knicely ,
I think that the sollution with delimited_with_line_numbers.txt is the best one.
Use case: we have to load some "big" text files. These text files will be afterwards parsed to some relational format. In order to identify possible errors like "Numeric expected in file "some file name" at line XXXX at positions 120 .. 122 but spaces found" we need that line number.
Regard
Alex