How to force a rollback of a COPY command

yamakawayamakawa Vertica Customer

Hi Everyone.

Currently, the process of loading CSV data into Vertica is as follows

# csv_converter: Module that performs format conversion of CSV files (delimiters, end-of-line characters)
# load_csv.py: Load data using COPY command (python)

cat ${INPUT_CSV_FILE} | csv_converter | load_csv.py

At this time, if an error occurs in csv_converter, load with the COPY command.
I want to force termination and rollback.

With the current method, if an invalid record is found in csv_converter
The module will terminate, but normal records prior to that will be loaded into Vertica.

Since we want to have all records registered or not registered in each CSV,
we need to make sure that all the records are loaded into Vertica.
The above behavior is undesirable.

If anyone knows of a better way to do this, we would be very interested in hearing about it.

That's all, thank you for reading.

Answers

  • Bryan_HBryan_H Vertica Employee Administrator

    I'm not sure how you scripted the load_csv.py, so I will offer a few options. You may be able to add one of the following to the COPY command:
    ABORT ON ERROR - Specifies that COPY stops if any row is rejected. The statement is rolled back and no data is loaded.
    NO COMMIT - COPY does not auto-commit. The Python script could load more batches with additional COPY...NO COMMIT and at the end, send COMMIT or ROLLBACK depending whether the upstream process succeeded.
    Please see the documentation for more info: https://forum.vertica.com/discussion/242620/how-to-force-a-rollback-of-a-copy-command

  • yamakawayamakawa Vertica Customer

    dear @Bryan_H

    Thank you for your response.

    Currently, in load_csv.py, I am running the COPY command with the ABORT ON ERROR option.
    When loading in Vertica,
    considering the case that a newline is included in an item, etc.,
    the CSV is parsed in the csv_converter module,
    and then the CSV is created with the delimiter and line end character replaced with special characters.

    If CSV parsing fails in csv_converter,
    an error is handled,
    but I want the subsequent load_csv.py to terminate abnormally when that error occurs,
    but I am considering what to do about that linkage part.

    It is possible to cause an ABORT ON ERROR by outputting a different number of dummy data than the number of table items,
    but I was wondering if there is an elegant solution.

  • Bryan_HBryan_H Vertica Employee Administrator

    The following example sets autocommit to false, and executes COPY statement with ABORT ON ERROR and NO COMMIT. There is no Python error handler, so if the execute throws an exception, the script exits and no rows are committed. If the execute succeeds, then "Rows Loaded" count is printed and rows are committed.

    import sys
    import vertica_python

    conn_info = {'host': '127.0.0.1',
    'user': 'x',
    'password': 'x',
    'database': 'x',
    }

    with vertica_python.connect(**conn_info) as connection:
    connection.autocommit = False
    cur = connection.cursor()
    cur.execute("COPY forum242640(a, b) FROM LOCAL '242640_data.csv' DELIMITER ',' ABORT ON ERROR NO COMMIT", buffer_size=65536)
    print("Rows loaded:", cur.fetchall())
    connection.commit()

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file