Vertica Performance degrade while adding nodes

PavanVejju · March 2021

step 1:
Created node aded few tables and executed our queries==> observed that little bit slow.
Step2:
Added host to the cluster and then this node added to database, so now it is 2 node cluster.
Then observed the performance,
For single node, it took 40 sec where as after added 2nd node, it degrade the performance i.e, it took 10mins+ .
After couple of hrs, i removed node 2 and tested, its came normal.

why this odd behavour? do I need to update any configurations?

LenoyJ · March 2021

Was rebalance completed after you added another node?

PavanVejju · March 2021

Yes @LenoyJ
It rebalanced,

LenoyJ · March 2021

Comparing the query plans of the two query runs may help you identify the difference.

PavanVejju · March 2021

Thank for for replied

It is scanning double,
For 1 node, the cost is 11M, where as for 2node cluster, the scan cost is 23M.Why this will happen?
Can you refer below

**Cluster Environment: **
+-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [ Cost: 23M, Rows: 78M ] (PATH ID: 1)
| Aggregates: count(DISTINCT dms.member_id_num)

**prod(1node env) **
+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [ Cost: 11M, Rows: 76M ] (PATH ID: 1)
| Aggregates: count(DISTINCT dms.member_id_num)

Nimmi_gupta · March 2021

@PavanVejju Can you share the query and projection design used in query profile or you can review based on the below information.
Examples
Assume the following projection:
CREATE PROJECTION … SEGMENTED BY HASH(a,b) ALL NODES ;
The following table explains whether or not resegmentation occurs at run time and why.
GROUP BY a : Requires resegmentation at run time. The query does not contain all the projection segmentation columns.
GROUP BY a, b : Does not require resegmentation at run time. The GROUP BY clause contains all the projection segmentation columns.

LenoyJ · March 2021

Research on global & local re-segmentation. See: https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringGROUPBYOptimizationWithProjectionDesign.htm
Tuning your projection for your query is important. Remember that when moving from a single node to a multi-node cluster, the segmentation clause in your projections start playing a big role.
Tuning your query also helps. It looks like you are using COUNT DISTINCT -> https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AnalyzingData/Optimizations/OptimizingDISTINCTInASELECTQueryList.htm
Make sure you analyze statistics - this helps create better query plans.

PavanVejju · March 2021

@Nimmi_gupta
https://pastebin.com/ggpjWa3G
I have added query, ddl with projections

PavanVejju · March 2021

Hi @LenoyJ

As we are dealing with Millions of records and need exact distinct count, so we are unable to use APPROXIMATE_COUNT_DISTINCT function where as Count distinct is taking lot of time.
For two or three table joins, how can we do create projections?

Nimmi_gupta · March 2021

@PavanVejju we did review the projection design and query. Try both the projection segmented including column game and member_id_num.

Nimmi_gupta · March 2021

@PavanVejju The query you have shared, can you enable the time and run the query on both the cluster and confirm how much is the different in execution?

EXPLAIN SELECT COUNT(DISTINCT dms.member_id_num) AS unique_players , b.game AS game FROM pp_vertica.daily_member_summary_modification dms
LEFT JOIN pp_vertica.game_info_modification b ON (b.game_id = dms.game_id_num AND dms.server_id = b.server_id AND dms.product_id = b.product_id)
WHERE dms.summary >= '2020-12-01' AND dms.summary < '2021-01-01' GROUP BY b.game;

Nimmi_gupta · March 2021

@PavanVejju Can you log a support case for this ?

Vertica_Curtis · May 2023

it's probably worth pointing out that this thread is over 2 years old. Also, you'd never go to a 2-node cluster in reality. There's no benefit in that. All you're doing is adding network latency. If you're going to expand from 1 node, you should go to 3 so you get fault tolerance in the form of K-safety.

Addressing the specifics of the OP's original question is difficult without understanding the projection design, and the query.

We're Moving!

Create My New Community Account Now

Vertica Performance degrade while adding nodes

Answers

Leave a Comment