Help with How to deal with "DDL statement interfered with this statement" .
Hi Guys!
I need help with a bulk load.I have a python script pumping data from a server outside to my vertica db every minute.The script is creating a cvs file and inserts it to a temp table.After that it does "MERGE" command with both tables (tmp and main tables).This way if there are duplicates it doesn't insert them.(Merge command inserts only rows that don't exist already in the main table).
A year ago it was working great but now there are over 3 billion rows in the main table.There are too many rows updated each minute and the Merge command is getting interrupted with another Merge command and that way the inserts are not made fully and I get many many failed queries with "DDL statement interfered with this statement" error.
I was wondering of a workaround for this situation.
If I will not let a couple of Merge commands run parallel the data will be uploaded too slow.
If I do "copy" command will the file be thrown out in a duplicate row case or just get errors? how do I check that?
I need to insert thousands (sometimes millions) of rows all day long to my main table with the option to make sure that duplicates are not created.It should run in a relatively fast way without getting the above error of a command locking the main table.
I don't know if it helps but my main table has a pk constraint projection.
Is there a smart way to do so?
any help will be greatly appreciated.
Kind regards,
Keren
Answers
Q - If I do "copy" command will the file be thrown out in a duplicate row case or just get errors?
A - The whole COPY statement will fail.
Please consider the following..
1. Using sequences for primary key columns to guarantee uniqueness can avoid the resource overhead that primary key constraints can incur.
2. Optimized MERGE is much faster than a non-optimized MERGE statement.
See: https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/AdministratorsGuide/Tables/MergeTables/OptimizedVsNonOptimizedMerge.htm
3. To make COPY statements load faster add nodes to the cluster
4. To avoid DELETEs, UPDATEs and MERGE you can use Live Aggregate Projection (LAP) TOP-K with the last version of the data.