How to force a rollback of a COPY command
yamakawa
Vertica Customer ✭
Hi Everyone.
Currently, the process of loading CSV data into Vertica is as follows
# csv_converter: Module that performs format conversion of CSV files (delimiters, end-of-line characters) # load_csv.py: Load data using COPY command (python) cat ${INPUT_CSV_FILE} | csv_converter | load_csv.py
At this time, if an error occurs in csv_converter
, load with the COPY command.
I want to force termination and rollback.
With the current method, if an invalid record is found in csv_converter
The module will terminate, but normal records prior to that will be loaded into Vertica.
Since we want to have all records registered or not registered in each CSV,
we need to make sure that all the records are loaded into Vertica.
The above behavior is undesirable.
If anyone knows of a better way to do this, we would be very interested in hearing about it.
That's all, thank you for reading.
Tagged:
0
Answers
I'm not sure how you scripted the load_csv.py, so I will offer a few options. You may be able to add one of the following to the COPY command:
ABORT ON ERROR - Specifies that COPY stops if any row is rejected. The statement is rolled back and no data is loaded.
NO COMMIT - COPY does not auto-commit. The Python script could load more batches with additional COPY...NO COMMIT and at the end, send COMMIT or ROLLBACK depending whether the upstream process succeeded.
Please see the documentation for more info: https://forum.vertica.com/discussion/242620/how-to-force-a-rollback-of-a-copy-command
dear @Bryan_H
Thank you for your response.
Currently, in load_csv.py, I am running the COPY command with the
ABORT ON ERROR
option.When loading in Vertica,
considering the case that a newline is included in an item, etc.,
the CSV is parsed in the
csv_converter
module,and then the CSV is created with the delimiter and line end character replaced with special characters.
If CSV parsing fails in
csv_converter
,an error is handled,
but I want the subsequent
load_csv.py
to terminate abnormally when that error occurs,but I am considering what to do about that linkage part.
It is possible to cause an
ABORT ON ERROR
by outputting a different number of dummy data than the number of table items,but I was wondering if there is an elegant solution.
The following example sets autocommit to false, and executes COPY statement with ABORT ON ERROR and NO COMMIT. There is no Python error handler, so if the execute throws an exception, the script exits and no rows are committed. If the execute succeeds, then "Rows Loaded" count is printed and rows are committed.
import sys
import vertica_python
conn_info = {'host': '127.0.0.1',
'user': 'x',
'password': 'x',
'database': 'x',
}
with vertica_python.connect(**conn_info) as connection:
connection.autocommit = False
cur = connection.cursor()
cur.execute("COPY forum242640(a, b) FROM LOCAL '242640_data.csv' DELIMITER ',' ABORT ON ERROR NO COMMIT", buffer_size=65536)
print("Rows loaded:", cur.fetchall())
connection.commit()