Vertica Performance degrade while adding nodes
PavanVejju
Vertica Customer ✭
step 1:
Created node aded few tables and executed our queries==> observed that little bit slow.
Step2:
Added host to the cluster and then this node added to database, so now it is 2 node cluster.
Then observed the performance,
For single node, it took 40 sec where as after added 2nd node, it degrade the performance i.e, it took 10mins+ .
After couple of hrs, i removed node 2 and tested, its came normal.
why this odd behavour? do I need to update any configurations?
0
Answers
Was rebalance completed after you added another node?
Yes @LenoyJ
It rebalanced,
Comparing the query plans of the two query runs may help you identify the difference.
Thank for for replied
It is scanning double,
For 1 node, the cost is 11M, where as for 2node cluster, the scan cost is 23M.Why this will happen?
Can you refer below
**Cluster Environment: **
+-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [ Cost: 23M, Rows: 78M ] (PATH ID: 1)
| Aggregates: count(DISTINCT dms.member_id_num)
**prod(1node env) **
+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [ Cost: 11M, Rows: 76M ] (PATH ID: 1)
| Aggregates: count(DISTINCT dms.member_id_num)
@PavanVejju Can you share the query and projection design used in query profile or you can review based on the below information.
Examples
Assume the following projection:
CREATE PROJECTION … SEGMENTED BY HASH(a,b) ALL NODES ;
The following table explains whether or not resegmentation occurs at run time and why.
GROUP BY a : Requires resegmentation at run time. The query does not contain all the projection segmentation columns.
GROUP BY a, b : Does not require resegmentation at run time. The GROUP BY clause contains all the projection segmentation columns.
@Nimmi_gupta
https://pastebin.com/ggpjWa3G
I have added query, ddl with projections
Hi @LenoyJ
As we are dealing with Millions of records and need exact distinct count, so we are unable to use APPROXIMATE_COUNT_DISTINCT function where as Count distinct is taking lot of time.
For two or three table joins, how can we do create projections?
@PavanVejju we did review the projection design and query. Try both the projection segmented including column game and member_id_num.
@PavanVejju The query you have shared, can you enable the time and run the query on both the cluster and confirm how much is the different in execution?
EXPLAIN SELECT COUNT(DISTINCT dms.member_id_num) AS unique_players , b.game AS game FROM pp_vertica.daily_member_summary_modification dms
LEFT JOIN pp_vertica.game_info_modification b ON (b.game_id = dms.game_id_num AND dms.server_id = b.server_id AND dms.product_id = b.product_id)
WHERE dms.summary >= '2020-12-01' AND dms.summary < '2021-01-01' GROUP BY b.game;
@PavanVejju Can you log a support case for this ?
it's probably worth pointing out that this thread is over 2 years old. Also, you'd never go to a 2-node cluster in reality. There's no benefit in that. All you're doing is adding network latency. If you're going to expand from 1 node, you should go to 3 so you get fault tolerance in the form of K-safety.
Addressing the specifics of the OP's original question is difficult without understanding the projection design, and the query.