Vertica 6.1.3-9 Merge Performance Deteriorates Over Time

AndyO · May 2015

In Vertica 6.1.3-9, we are running a merge 3 times a day in an ETL job that has a source table of 8-11 million rows and a target table of 120 million rows. Most (90 % or more) of the source rows are matches that will update the target. Both tables have default projections that are ordered by and segmented (hash) on the merge key. The segments are not partitioned. (We know about the optimized merge article, but it applies to version 7.1 which we do not currently have available, and we don't expect to be upgrading in the near term.)

With a newly created and populated target tabel, the merge runs in roughly 4 minutes (impressive considering the data volume). However, run times start to deteriorate from there. For example, and 1 day, we see times in the 40 minute range, and it keeps getting worse day by day until it will no longer complete within the 2 hour window we have available. If we perform what amounts to a full table reorganization (rename the target table, create a fresh one and copy all the data using INSERT as SELECT), performance goes back to around 4 minutes for the next run.

Having many, many years in data management, the deterioration we are seeing is consistent with table fragmentation, but I also know that isn't supposed to happen in Vertica ROS (and we are using the hint for a direct merge, so it is writing to ROS instead of going through WOS first).

Any suggestions?

Adrian_Oprea_1 · May 2015

Could this be becouse of the deleted vectors you generate on your merges ? Since you are generating delete vectors the performance will degrade with time if this vectors will not be removed/purged.

Since your performance comes back o normal after you drop the table this action will also drop the delete vectors.

See this example:

-- check for delete vectors
(dbadmin@:5433) [dbadmin] > select * from delete_vectors;
(No rows)

-- run the merge command
(dbadmin@:5433) [dbadmin] *>  MERGE INTO target_table t USING source_table s ON s.id=t.id
dbadmin-> WHEN MATCHED THEN UPDATE SET count=s.count+t.count;
-[ RECORD 1 ]
OUTPUT | 1

-- data was altered
(dbadmin@:5433) [dbadmin] > select * from target_table;
-[ RECORD 1 ]
id    | 1
count | 2

-- delete vectors have been created
(dbadmin@:5433) [dbadmin] *> select * from delete_vectors;
-[ RECORD 1 ]-----+-------------------
node_name         | v_dcg023_node0001
schema_name       | public
projection_name   | target_table_super
storage_type      | DVWOS
dv_oid            | 45035996275176827
storage_oid       | 45035996275176763
deleted_row_count | 1
used_bytes        | 16384
start_epoch       | 1144
end_epoch         | 1144
is_sorted         | t

-- drop table 
(dbadmin@:5433) [dbadmin] *> drop table target_table;
DROP TABLE

-- check again for delete vectors
(dbadmin@:5433) [dbadmin] > select * from delete_vectors;
(No rows)

-- no delete vectors!!!

- so i think you should look into this !

Use this to see tables that need purging also purge policy should be looked into.

SELECT 'SELECT PURGE_PROJECTION(' || '''' 
       ||  schema_name || '.' ||  projection_name || '''' 
       || ') AS ' ||  projection_name || ';'
FROM v_monitor.delete_vectors
WHERE schema_name IN (Your Schema Name)
;

Will return the script you need ot execute.

We're Moving!

Create My New Community Account Now

Vertica 6.1.3-9 Merge Performance Deteriorates Over Time

Comments

Leave a Comment