Options

Vertica 6.1.3-9 Merge Performance Deteriorates Over Time

In Vertica 6.1.3-9, we are running a merge 3 times a day in an ETL job that has a source table of 8-11 million rows and a target table of 120 million rows.  Most (90 % or more) of the source rows are matches that will update the target.  Both tables have default projections that are ordered by and segmented (hash) on the merge key.  The segments are not partitioned.   (We know about the optimized merge article, but it applies to version 7.1 which we do not currently have available, and we don't expect to be upgrading in the near term.)

 

With a newly created and populated target tabel, the merge runs in roughly 4 minutes (impressive considering the data volume).  However, run times start to deteriorate from there.  For example, and 1 day, we see times in the 40 minute range, and it keeps getting worse day by day until it will no longer complete within the 2 hour window we have available.  If we perform what amounts to a full table reorganization (rename the target table, create a fresh one and copy all the data using INSERT as SELECT), performance goes back to around 4 minutes for the next run.  

 

Having many, many years in data management, the deterioration we are seeing is consistent with table fragmentation, but I also know that isn't supposed to happen in Vertica ROS (and we are using the hint for a direct merge, so it is writing to ROS instead of going through WOS first).

 

Any suggestions?

Comments

  • Options

    Could this be becouse of the deleted vectors you generate on your merges ? Since you are generating delete vectors the performance will degrade with time if this vectors will not be removed/purged.

     Since your performance comes back o normal after you drop the table this action will also drop the delete vectors.

     

    See this example:

    -- check for delete vectors
    (dbadmin@:5433) [dbadmin] > select * from delete_vectors;
    (No rows)

    -- run the merge command
    (dbadmin@:5433) [dbadmin] *> MERGE INTO target_table t USING source_table s ON s.id=t.id
    dbadmin-> WHEN MATCHED THEN UPDATE SET count=s.count+t.count;
    -[ RECORD 1 ]
    OUTPUT | 1

    -- data was altered
    (dbadmin@:5433) [dbadmin] > select * from target_table;
    -[ RECORD 1 ]
    id | 1
    count | 2

    -- delete vectors have been created
    (dbadmin@:5433) [dbadmin] *> select * from delete_vectors;
    -[ RECORD 1 ]-----+-------------------
    node_name | v_dcg023_node0001
    schema_name | public
    projection_name | target_table_super
    storage_type | DVWOS
    dv_oid | 45035996275176827
    storage_oid | 45035996275176763
    deleted_row_count | 1
    used_bytes | 16384
    start_epoch | 1144
    end_epoch | 1144
    is_sorted | t

    -- drop table
    (dbadmin@:5433) [dbadmin] *> drop table target_table;
    DROP TABLE

    -- check again for delete vectors
    (dbadmin@:5433) [dbadmin] > select * from delete_vectors;
    (No rows)

    -- no delete vectors!!!

     

    - so i think you should look into this ! 

    Use this to see  tables that need purging also purge policy should be looked into.

    SELECT 'SELECT PURGE_PROJECTION(' || '''' 
    || schema_name || '.' || projection_name || ''''
    || ') AS ' || projection_name || ';'
    FROM v_monitor.delete_vectors
    WHERE schema_name IN (Your Schema Name)
    ;

    Will return the script you need ot execute.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file