Mergeout operation does not run on time?

spektom · April 2014

Hello,

We're getting "too many ROS containers" during MERGE operation. I've tried to fix that by setting the MergeOutInterval value to 300, but it seems like the merge out operation doesn't run in the specified frequency (current time is 16:25):

aa1=> select * from v_monitor.tuple_mover_operations where projection_name='client_dim_node0001' and operation_name='Mergeout' order by operation_start_timestamp desc limit 10;   operation_start_timestamp   |   node_name    | operation_name |         operation_status          | table_schema | table_name |   projection_name   |   projection_id   | column_id | earliest_container_start_epoch | latest_container_end_epoch | ros_count | total_ros_used_bytes |   plan_type   |       session_id       | is_executing | runtime_priority 
-------------------------------+----------------+----------------+-----------------------------------+--------------+------------+---------------------+-------------------+-----------+--------------------------------+----------------------------+-----------+----------------------+---------------+------------------------+--------------+------------------
 2014-04-27 15:56:52.262467+03 | v_aa1_node0001 | Mergeout       | Complete                          | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985929 |                     987488 |        84 |             24547348 | Replay Delete | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:56:37.932777+03 | v_aa1_node0001 | Mergeout       | Change plan type to Replay Delete | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985929 |                     987488 |        84 |             24547348 | Replay Delete | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:56:28.522484+03 | v_aa1_node0001 | Mergeout       | Start                             | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985929 |                     987488 |        84 |             24547348 | Mergeout      | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:56:28.516812+03 | v_aa1_node0001 | Mergeout       | Complete                          | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985918 |                     987568 |       249 |             41446397 | Replay Delete | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:55:14.129814+03 | v_aa1_node0001 | Mergeout       | Change plan type to Replay Delete | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985918 |                     987568 |       249 |             41446397 | Replay Delete | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 15:53:51.177644+03 | v_aa1_node0001 | Mergeout       | Start                             | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         985918 |                     987568 |       249 |             41446397 | Mergeout      | idc-sci-2-2557:0x28e9c | f            | MEDIUM
 2014-04-27 14:02:32.294431+03 | v_aa1_node0001 | Mergeout       | Complete                          | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         906263 |                     914651 |       169 |             55822133 | Replay Delete | idc-sci-2-2557:0x27eac | f            | HIGH
 2014-04-27 14:01:22.314386+03 | v_aa1_node0001 | Mergeout       | Change plan type to Replay Delete | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         906263 |                     914651 |       169 |             55822133 | Replay Delete | idc-sci-2-2557:0x27eac | f            | HIGH
 2014-04-27 14:00:49.654613+03 | v_aa1_node0001 | Mergeout       | Start                             | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         906263 |                     914651 |       169 |             55822133 | Mergeout      | idc-sci-2-2557:0x27eac | f            | HIGH
 2014-04-27 14:00:49.640884+03 | v_aa1_node0001 | Mergeout       | Complete                          | public       | client_dim | client_dim_node0001 | 45035996274167882 |         0 |                         906915 |                     914607 |       249 |             72754494 | Replay Delete | idc-sci-2-2557:0x27eac | f            | HIGH
(10 rows)

Please advice how can I make sure that ROS containers are merged on time.

Thanks,
Michael

Daniel_Leybovic · April 2014

Hi!

Can you run a procedure - print_next_mergeout_job?

daniel=> select print_next_mergeout_job();
                                                          print_next_mergeout_job                                                           
--------------------------------------------------------------------------------------------------------------------------------------------
 Site v_dev_node0001:

 Eligible for mergeout: 
No ROSes eligible for mergeout
 Eligible for dv mergeout: 
No DVROSes eligible for mergeout
(1 row)

>> We're getting "too many ROS containers" during MERGE operation
Looks like STRATA ISSUE, but Im not sure.

1. If you don't think to grow up so you must to disable "Scale Factor". It will reduce amount of segments.

2. Check that you didn't get what calls "STRATA ISSUE"
Zvika explains about "STRATA ISSUE" (00:41:00)

https://www.youtube.com/watch?v=ISa9BNGK1Dg

3. MERGEOUT can't consolidate all containers(In case of "STRATA ISSUE"), but PURGE does, so may be you need to run PURGE sometimes.

PS
You were warned about "Too many ROS": https://community.vertica.com/vertica/topics/partition_by_timestamptz_field_how

spektom · April 2014

Hi Daniel,

I've disabled the scale factor (indeed, we're running on a single node currently), and it reduced the number of partitions to ~160.

I tried tuning the default resource pool settings for tuple mover (threads number and memory), and it seems to fix the issue for a while. I'll let it run, then I'll write back in this thread whether the issue has been fixed completely.

Thanks for your help and for the great video!

We're Moving!

Create My New Community Account Now

Mergeout operation does not run on time?

Comments

Leave a Comment