Mergeout operation does not run on time?

Hello,

We're getting "too many ROS containers" during MERGE operation. I've tried to fix that by setting the MergeOutInterval value to 300, but it seems like the merge out operation doesn't run in the specified frequency (current time is 16:25):
aa1=> select * from v_monitor.tuple_mover_operations where projection_name='client_dim_node0001' and operation_name='Mergeout' order by operation_start_timestamp desc limit 10;   operation_start_timestamp   |   node_name    | operation_name |         operation_status          | table_schema | table_name |   projection_name   |   projection_id   | column_id | earliest_container_start_epoch | latest_container_end_epoch | ros_count | total_ros_used_bytes |   plan_type   |       session_id       | is_executing | runtime_priority 
-------------------------------+----------------+----------------+-----------------------------------+--------------+------------+---------------------+-------------------+-----------+--------------------------------+----------------------------+-----------+----------------------+---------------+------------------------+--------------+------------------
 2014-04-27 15:56:52.262467+03 | v_aa1_node0001 | Mergeout       | Complete                          | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985929 |                     987488 |        84 |             24547348 | Replay Delete | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:56:37.932777+03 | v_aa1_node0001 | Mergeout       | Change plan type to Replay Delete | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985929 |                     987488 |        84 |             24547348 | Replay Delete | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:56:28.522484+03 | v_aa1_node0001 | Mergeout       | Start                             | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985929 |                     987488 |        84 |             24547348 | Mergeout      | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:56:28.516812+03 | v_aa1_node0001 | Mergeout       | Complete                          | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985918 |                     987568 |       249 |             41446397 | Replay Delete | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:55:14.129814+03 | v_aa1_node0001 | Mergeout       | Change plan type to Replay Delete | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985918 |                     987568 |       249 |             41446397 | Replay Delete | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:53:51.177644+03 | v_aa1_node0001 | Mergeout       | Start                             | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985918 |                     987568 |       249 |             41446397 | Mergeout      | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 14:02:32.294431+03 | v_aa1_node0001 | Mergeout       | Complete                          | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         906263 |                     914651 |       169 |             55822133 | Replay Delete | idc-sci-2-2557:0x27eac | f            | HIGH
 2014-04-27 14:01:22.314386+03 | v_aa1_node0001 | Mergeout       | Change plan type to Replay Delete | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         906263 |                     914651 |       169 |             55822133 | Replay Delete | idc-sci-2-2557:0x27eac | f            | HIGH
 2014-04-27 14:00:49.654613+03 | v_aa1_node0001 | Mergeout       | Start                             | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         906263 |                     914651 |       169 |             55822133 | Mergeout      | idc-sci-2-2557:0x27eac | f            | HIGH
 2014-04-27 14:00:49.640884+03 | v_aa1_node0001 | Mergeout       | Complete                          | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         906915 |                     914607 |       249 |             72754494 | Replay Delete | idc-sci-2-2557:0x27eac | f            | HIGH
(10 rows)
Please advice how can I make sure that ROS containers are merged on time.

Thanks,
Michael

Comments

  • Hi!

    Can you run a procedure - print_next_mergeout_job?
    daniel=> select print_next_mergeout_job();
    print_next_mergeout_job
    --------------------------------------------------------------------------------------------------------------------------------------------
    Site v_dev_node0001:

    Eligible for mergeout:
    No ROSes eligible for mergeout
    Eligible for dv mergeout:
    No DVROSes eligible for mergeout
    (1 row)

    >> We're getting "too many ROS containers" during MERGE operation
    Looks like STRATA ISSUE, but Im not sure.

    1. If you don't think to grow up so you must to disable "Scale Factor". It will reduce amount of segments.

    2. Check that you didn't get what calls "STRATA ISSUE"
    Zvika explains about "STRATA ISSUE" (00:41:00)
    https://www.youtube.com/watch?v=ISa9BNGK1Dg


    3. MERGEOUT can't consolidate all containers(In case of "STRATA ISSUE"), but PURGE does, so may be you need to run PURGE sometimes.

    PS
    You were warned about "Too many ROS": https://community.vertica.com/vertica/topics/partition_by_timestamptz_field_how

  • Hi Daniel,

    I've disabled the scale factor (indeed, we're running on a single node currently), and it reduced the number of partitions to ~160.

    I tried tuning the default resource pool settings for tuple mover (threads number and memory), and it seems to fix the issue for a while. I'll let it run, then I'll write back in this thread whether the issue has been fixed completely.

    Thanks for your help and for the great video!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file