Vertica Performance degrade while adding nodes

PavanVejjuPavanVejju Vertica Customer

step 1:
Created node aded few tables and executed our queries==> observed that little bit slow.
Step2:
Added host to the cluster and then this node added to database, so now it is 2 node cluster.
Then observed the performance,
For single node, it took 40 sec where as after added 2nd node, it degrade the performance i.e, it took 10mins+ .
After couple of hrs, i removed node 2 and tested, its came normal.

why this odd behavour? do I need to update any configurations?

Answers

  • LenoyJLenoyJ - Select Field - Employee

    Was rebalance completed after you added another node?

  • PavanVejjuPavanVejju Vertica Customer

    Yes @LenoyJ
    It rebalanced,

  • LenoyJLenoyJ - Select Field - Employee

    Comparing the query plans of the two query runs may help you identify the difference.

  • PavanVejjuPavanVejju Vertica Customer

    Thank for for replied

    It is scanning double,
    For 1 node, the cost is 11M, where as for 2node cluster, the scan cost is 23M.Why this will happen?
    Can you refer below

    **Cluster Environment: **
    +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [ Cost: 23M, Rows: 78M ] (PATH ID: 1)
    | Aggregates: count(DISTINCT dms.member_id_num)

    **prod(1node env) **
    +-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [ Cost: 11M, Rows: 76M ] (PATH ID: 1)
    | Aggregates: count(DISTINCT dms.member_id_num)

  • Nimmi_guptaNimmi_gupta - Select Field - Employee

    @PavanVejju Can you share the query and projection design used in query profile or you can review based on the below information.
    Examples
    Assume the following projection:
    CREATE PROJECTION … SEGMENTED BY HASH(a,b) ALL NODES ;
    The following table explains whether or not resegmentation occurs at run time and why.
    GROUP BY a : Requires resegmentation at run time. The query does not contain all the projection segmentation columns.
    GROUP BY a, b : Does not require resegmentation at run time. The GROUP BY clause contains all the projection segmentation columns.

  • LenoyJLenoyJ - Select Field - Employee
    edited March 2021
  • PavanVejjuPavanVejju Vertica Customer

    @Nimmi_gupta
    https://pastebin.com/ggpjWa3G
    I have added query, ddl with projections

  • PavanVejjuPavanVejju Vertica Customer

    Hi @LenoyJ

    As we are dealing with Millions of records and need exact distinct count, so we are unable to use APPROXIMATE_COUNT_DISTINCT function where as Count distinct is taking lot of time.
    For two or three table joins, how can we do create projections?

  • Nimmi_guptaNimmi_gupta - Select Field - Employee

    @PavanVejju we did review the projection design and query. Try both the projection segmented including column game and member_id_num.

  • Nimmi_guptaNimmi_gupta - Select Field - Employee

    @PavanVejju The query you have shared, can you enable the time and run the query on both the cluster and confirm how much is the different in execution?

    EXPLAIN SELECT COUNT(DISTINCT dms.member_id_num) AS unique_players , b.game AS game FROM pp_vertica.daily_member_summary_modification dms
    LEFT JOIN pp_vertica.game_info_modification b ON (b.game_id = dms.game_id_num AND dms.server_id = b.server_id AND dms.product_id = b.product_id)
    WHERE dms.summary >= '2020-12-01' AND dms.summary < '2021-01-01' GROUP BY b.game;

  • Nimmi_guptaNimmi_gupta - Select Field - Employee

    @PavanVejju Can you log a support case for this ?

  • Hmm, that's an interesting situation you encountered. It seems like adding the second node to the cluster actually degraded the performance instead of improving it. There could be a few reasons for this odd behavior.

    One possibility is that the cluster configuration or settings were not properly optimized to handle the addition of a second node. It's worth checking if there are any specific configurations or adjustments that need to be made when adding nodes to the cluster. Sometimes, additional steps are required to ensure that the cluster can effectively distribute the workload among the nodes.

  • it's probably worth pointing out that this thread is over 2 years old. Also, you'd never go to a 2-node cluster in reality. There's no benefit in that. All you're doing is adding network latency. If you're going to expand from 1 node, you should go to 3 so you get fault tolerance in the form of K-safety.

    Addressing the specifics of the OP's original question is difficult without understanding the projection design, and the query.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file