Is it safe (performance, locking, etc) to update fields in a monthly partitioned table?
Hello. We have a table which as of this writing has like 4.6 billion records. I'm a software engineer and am working on a project which will involve releasing a job that will update a field in this table for existing records. The data in this table started on 2015-07-01, thus, as of this writing it spans like 2 years up to date. This table gets filled by a single job with real time data every 2 minutes. The job that I'm working on will update records that are at least 1 day old. There are many jobs which read data from this table. Question: is it safe to release the updating job? I'm worried about when the updating job will be running could it block or make significantly slower the reading jobs and/or the single writing job.
0
Comments
Hi,
Sounds like you plan on updating ~ billion records (records older than one day) in a single transaction. In Vertica, an UPDATE is actually a DELETE/INSERT. So you'll be deleting 1 billion records and inserting 1 billion records! You can do it, but it's not going to perform very well.
DML statements INSERT, UPDATE, and DELETE perform the same functions that they do in any ACID compliant database (So it is safe).
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/TrickleLoading/UsingINSERTUPDATEAndDELETE.htm
Since an UPDATE is includes a DELETE operation, make sure to purge the deleted data.
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/BulkDeletingPurging/PurgingDeletedData.htm
Note that while your job is running, no other process can update or delete from the table until your process completes.
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations/PerformanceConsiderationsForDELETEAndUPDATEQueries.htm
You can perform a few steps to optimize both the DELETE ad UPDATE operations.
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations/OptimizingDELETEsAndUPDATEsForPerformance.htm
Take a look at this post "Use MERGE to Update 1 Million Rows". It might help you think of other ways to update the data faster:
See:
https://my.vertica.com/blog/use-merge-update-1-million-rows/