Tuning Vertica query performance during mergeout
Hi,
This query is regarding mergeout
and its impact on query performance.
What I have found out:
1. The mergeout
happens in strata levels (smaller to bigger until 1 ROS per partition/table is reached)
2. Lower strata mergeout
happens rather quickly, then higher strata mergeout
.
3. Increasing the plannedcocurrency
of tm pool, only affects mergeout
of lower strata.
Below are the questions I have,
1. Suppose if there are 4 strata layers, until which strata layer multiple threads of mergeout
can work? Does only 1 thread operate on the uppermost strata layer?
2. Does mergeout
happen in WOS? If it does, is it a good idea to make the frequency of mergeout
more than moveout
?
3. If I have set both MergeOutInterval = 600
and MoveOutSizePct='20%'
, which variable has higher precedence?
4. What is the difference between ROSPerStratum
and MaxROSPerStratum
? Will MaxROSPerStratum
override ROSPerStratum
if I have set MaxMrgOutROSSizeMB
to a predefined value?
6. If my query answers are coming from multiple small ROS containers which are to be merged, will increasing executionparallelism
(for the analytical query pool) help in any way?
5. If I have defined MaxMrgOutROSSizeMB
to a particular value, can I manually merge a projection to get a bigger ROS Container?
For example when I archived the older partition with may small ROS, and that point I am willing to consolidate the bigger containers into one.
Lastly, the reason for all these questions above is that,
I have a very large table(200 columns, billion rows per week, 5-6 weeks)
with partition size of around 10GB per partition for the super-projection.
And real-time loading from Kafka write a batch of around 20-30MB/batch (1 or 2 batches in 1 minute)
After 20-26 hours when mini-ROS are consolidated to larger ROS, then the performance of queries(100 queries per minute) tanks,
so much so that the application SLA is breached.
Please feel free to correct me where I am wrong.
Any advice for performance improvement will be helpful.
Let me know for any clarifications.
Below is a very related question with my scenarios, unanswered in stackexchange.
How can I reduce the Tuple Mover's disk I/O impact for mergeouts?
Thanks,
Minat