copy from local, autogenerate line number

edited February 2021 in General Discussion

Is there any possibility to autogenerate (not with increment at db level) a line number during the import of a local file?
create table TEST_RN_COPY (rn integer, c1 varchar(10), c2 varchar(10));
copy dwh.TEST_RN_COPY (c1,c2) from local 'delimited.txt' delimiter ',';
How to filll the RN??
The information exists sowewhere in Vertica, see the next "error line" from the exceptions file
COPY: Input record 2 has been rejected (Invalid integer format 'row2' for column 1 (rn)).
Please see , record 2 for the rejected record. This record was record 2 from wq\delimited.txt

Regards,
Alex

Tagged:

Best Answers

  • Jim_KnicelyJim_Knicely Administrator
    Answer ✓

    Then don't use STDIN :)

    dbadmin=> TRUNCATE TABLE alpha;
    TRUNCATE TABLE
    
    dbadmin=> \! cat -n /home/dbadmin/delimited.txt | column -t -o '|' > /home/dbadmin/delimited_with_line_numbers.txt
    
    dbadmin=> COPY alpha FROM '/home/dbadmin/delimited_with_line_numbers.txt';
     Rows Loaded
    -------------
              26
    (1 row)
    
    dbadmin=> SELECT * FROM alpha;
     c1 | c2
    ----+----
      1 | A
      2 | B
      3 | C
      4 | D
      5 | E
      6 | F
      7 | G
      8 | H
      9 | I
     10 | J
     11 | K
     12 | L
     13 | M
     14 | N
     15 | O
     16 | P
     17 | Q
     18 | R
     19 | S
     20 | T
     21 | U
     22 | V
     23 | W
     24 | X
     25 | Y
     26 | Z
    (26 rows)
    
    
  • Answer ✓

    Sorry @Jim_Knicely,
    the "delimited with line numbers" is very slow and changes completly the output file, splitting the whole input file in columns

    dbadmin=> ! cat -n /home/dbadmin/delimited.txt | column -t -o '|' > /home/dbadmin/delimited_with_line_numbers.txt

    dbadmin=> COPY alpha FROM '/home/dbadmin/delimited_with_line_numbers.txt';

    But you give me an ideea
    https://unix.stackexchange.com/questions/222218/how-to-add-line-numbers-in-every-line-using-shell-command
    I can use
    awk '{print NR "|" $s}' input_file > output_file
    The last line of the output:
    61369814|61120000000000000000810000008100000081000000810G 0000000000000A
    Regards,
    Alex

Answers

  • marcothesanemarcothesane - Select Field - Administrator

    There is no way to add an expression of the type column_n AS <some_kind_of_row_sequence to the COPY command.
    A possible workaround was posted some 4 years ago - I don't know whether that can help you ...
    https://forum.vertica.com/discussion/237457/add-sequence-numbers-to-a-bulk-load-in-vertica

  • Hi Alex. You could write your own parser and populate the rn column with any sequence you like. See the UD parser documentation for that.
    https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/ExtendingVertica/UDx/UDL/UDLParser.htm
    (If you want to be strict and match the first record with sequence=1, the second with 2, etc. you'd need to prevent parallel loads. But will definitely hinder performance at scale.)
    Also, the record number you see in Vertica logs is generally relative to the block of data being parsed. For some parsers, the block of data is the file and you will get an absolute row number (csv parser is an example of that). But other parsers chunk the input file and run parallel loads. In that case, you don't have an absolute row number. But the error message will tell you what went wrong parsing the specific record, like data type coercion or trying to load null value in a non-null column, etc.
    I am curious to know more details about your use case. Can you elaborate more on what you are trying to accomplish? Why an identity column does not satisfy your needs?

  • Jim_KnicelyJim_Knicely Administrator
    edited February 2021

    Maybe like this?

    dbadmin=> CREATE TABLE alpha (c1 INT, c2 VARCHAR);
    CREATE TABLE
    
    dbadmin=> \! cat /home/dbadmin/delimited.txt
    A
    B
    C
    D
    E
    F
    G
    H
    I
    J
    K
    L
    M
    N
    O
    P
    Q
    R
    S
    T
    U
    V
    W
    X
    Y
    Z
    
    dbadmin=> \! cat -n delimited.txt | column -t -o '|' | vsql -c "COPY alpha FROM STDIN;"
    dbadmin=> SELECT * FROM alpha ORDER BY c1;
     c1 | c2
    ----+----
      1 | A
      2 | B
      3 | C
      4 | D
      5 | E
      6 | F
      7 | G
      8 | H
      9 | I
     10 | J
     11 | K
     12 | L
     13 | M
     14 | N
     15 | O
     16 | P
     17 | Q
     18 | R
     19 | S
     20 | T
     21 | U
     22 | V
     23 | W
     24 | X
     25 | Y
     26 | Z
    (26 rows)
    
  • @Jim_Knicely That'd be another way to prevent parallel loads. COPY FROM STDIN is single threaded. Fine for small data. I'm still curious about the fundamental use case.

  • Hi @Jim_Knicely ,
    using cat to add a new colum to the file seems to be the best sollution. We will load some "big" text files, which will be further parsed to some relational format. The "original" line number is needed for debugging.
    Regards
    Alex

  • Hi @Jim_Knicely ,
    I think that the sollution with delimited_with_line_numbers.txt is the best one.
    Use case: we have to load some "big" text files. These text files will be afterwards parsed to some relational format. In order to identify possible errors like "Numeric expected in file "some file name" at line XXXX at positions 120 .. 122 but spaces found" we need that line number.
    Regards
    Alex

  • Hi @Jim_Knicely ,
    I think that the sollution with delimited_with_line_numbers.txt is the best one.
    Use case: we have to load some "big" text files. These text files will be afterwards parsed to some relational format. In order to identify possible errors like "Numeric expected in file "some file name" at line XXXX at positions 120 .. 122 but spaces found" we need that line number.
    Regard
    Alex

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file