Options

when rows get removed physically from vertica

Please forgive the lenght of this post, but I wanted to include enough details.

I'm trying to get a better understanding of when rows get physcially removed

select schema_name, projection_name, sum(deleted_row_count)

from delete_vectors

group by schema_name, projection_name

order by 3 desc

limit 1;

 

schema_name |                 projection_name                 |    sum   

-------------+-------------------------------------------------+-----------

QIW_STAGE   | TABLE1_b0                       | 152534315

So, the table has undergone a lot of deletions,  with the _b0 projection having 152 million deleted rows.

 vertica=> select count(*) from qiw_stage.table1

  count  

----------

19673884

 

And the table only has  roughly 20 million rows !!

 

I understand that Vertica will never physically delete rows unless there were deleted BEFORE the current Ancient History Mark. 

 vertica=> select get_ahm_epoch();

get_ahm_epoch

---------------

       2077987

 vertica=> select get_current_epoch();

get_current_epoch

-------------------

           2077988

 vertica=> select get_ahm_time();

                  get_ahm_time                  

-------------------------------------------------

Current AHM Time: 2014-10-26 20:02:49.976049-04

 So, the current epoch is 2077988, and the AHM is set to epoch  2077987, which corresponds to roughly 10 PM last night.

 Also, I understand that when doing mergouts,  Vertica will tend to choose the smallest containers to merge and very large containers may stick around for a long time before getting merged.   Therefore, if you have a very large container that has a lot of deleted rows, it may never actually get the deleted rows purged.   However, there is a parameter called PurgeMergeoutPercent.  If I understand what this parameter does correctly, it causes the mergeout process to consider merging containers (even big ones) whenever the amount of deleted rows gets larger than the this percentage.

 vertica=> select get_config_parameter('PurgeMergeoutPercent');

get_config_parameter

----------------------

20

This is where I really get a little confused as to what is going on.   I wanted to find out what percentage of deleted rows was in each storage container.  The STORAGE_CONTAINERS table shows each ROS container.  Included in this table is how many deleted rows are in each container.

vertica=> select node_name, projection_name, total_row_count, deleted_row_count, start_epoch, end_epoch

from storage_containers where projection_name = 'TABLE1;

       node_name        | total_row_count | deleted_row_count | start_epoch | end_epoch
------------------------+-----------------+-------------------+-------------+-----------
 v_vertica_dev_node0001 |          165984 |            165984 |     1971366 |   1971366
 v_vertica_dev_node0001 |        45536661 |          45536661 |     1968722 |   1971405
 v_vertica_dev_node0001 |          166445 |            166445 |     1976713 |   1976713
 v_vertica_dev_node0001 |          201824 |            201824 |     1976745 |   1976746

. .

v_vertica_dev_node0001 |          999299 |                 0 |     2077509 |   2077514
 v_vertica_dev_node0001 |          166445 |                 0 |     2077515 |   2077515
 v_vertica_dev_node0001 |         1333435 |                 0 |     2077516 |   2077523
 v_vertica_dev_node0001 |         1333430 |                 0 |     2077524 |   2077531
 v_vertica_dev_node0001 |         1332719 |                 0 |     2077532 |   2077539
 v_vertica_dev_node0001 |         1333065 |                 0 |     2077540 |   2077547
 v_vertica_dev_node0001 |           57827 |                 0 |     2077548 |   2077548
...

I deleted a many of the 96 rows of output for the sake of clarity. 

If I’m reading this correctly,  all containers have either deleted all rows or no rows.   And for the first two containers listed (for example), all of the changes in this container happened LONG before the current epoch.

I know we can go through and manually purge these rows if we want to using the purge() functions, but if you use any of these on a partitioned projection, it removes the partitioning, so I’m hesitant to use this.   I’m not even sure that having all these deleted rows around is a problem.   I'm just getting the impression that these rows may never be removed unless I use one of the purge() functions.    Any insight into this behavior would be appreciated.

Comments

  • Options
    Hi Mad Zymurgist,

    Hm...  Just to confirm, have you tried doing an explicit "select do_tm_task('mergeout');", to make sure that Vertica has attempted to do a mergeout?

    My understanding is that purge() does not remove ordinary table partitioning.  It might, however, undo custom partition-level manipulations such as merge_partitions().  (Several such partition-level functions are currently deprecated -- yes, they do have their uses; but the current APIs have so many gotcha's like the one above that they tend to cause more headaches than they are worth.  See the Vertica documentation for details.)

    Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file