Vertica 6.1.3-9 Merge Performance Deteriorates Over Time
In Vertica 6.1.3-9, we are running a merge 3 times a day in an ETL job that has a source table of 8-11 million rows and a target table of 120 million rows. Most (90 % or more) of the source rows are matches that will update the target. Both tables have default projections that are ordered by and segmented (hash) on the merge key. The segments are not partitioned. (We know about the optimized merge article, but it applies to version 7.1 which we do not currently have available, and we don't expect to be upgrading in the near term.)
With a newly created and populated target tabel, the merge runs in roughly 4 minutes (impressive considering the data volume). However, run times start to deteriorate from there. For example, and 1 day, we see times in the 40 minute range, and it keeps getting worse day by day until it will no longer complete within the 2 hour window we have available. If we perform what amounts to a full table reorganization (rename the target table, create a fresh one and copy all the data using INSERT as SELECT), performance goes back to around 4 minutes for the next run.
Having many, many years in data management, the deterioration we are seeing is consistent with table fragmentation, but I also know that isn't supposed to happen in Vertica ROS (and we are using the hint for a direct merge, so it is writing to ROS instead of going through WOS first).
Any suggestions?
Comments
Could this be becouse of the deleted vectors you generate on your merges ? Since you are generating delete vectors the performance will degrade with time if this vectors will not be removed/purged.
Since your performance comes back o normal after you drop the table this action will also drop the delete vectors.
See this example:
- so i think you should look into this !
Use this to see tables that need purging also purge policy should be looked into.
Will return the script you need ot execute.