Reading Query Plan And Optimizing It

suryatejasuryateja Vertica Customer

Hi Team,

I am facing hard time in understanding the bottle necks for the performance. I am looking to run this query within 15 seconds but its completing in around 2 mins. I tried partitions and didn't find them much useful. Can someone help me out on how can i optimize this for the perfromance i am looking for -

Below is the query plan -


QUERY PLAN DESCRIPTION:

EXPLAIN SELECT dim_brand_info.brand AS "brand", coalesce(dim_tables.game_type,'No Gametype') AS "game_type1", (case when (dim_tables.table_full_name ='null' or dim_tables.table_full_name is null) then 'No Table' else dim_tables.table_full_name end) AS "table_full_name1", (case when (dim_tables.table_full_name ='null' or dim_tables.table_full_name is null) then 'No Tablefullname' else dim_tables.table_full_name end) AS "table_full_name", sum(fact_live_daily_summary.ggr_euro)100/sum(fact_live_daily_summary.bets_euro) AS "net_profic_perce", count(distinct fact_live_daily_summary.game_id) AS "game_count", SUM(fact_live_daily_summary.bet_count) AS "sum_bet_count", count(distinct fact_live_daily_summary.player_id) AS "players", SUM(fact_live_daily_summary.bets_euro) AS "sum_bets_euro", SUM(fact_live_daily_summary.wins_euro) AS "sum_wins_euro", SUM(fact_live_daily_summary.ggr_euro) AS "sum_ggr_euro", sum(fact_live_daily_summary.wins_euro)100/sum(fact_live_daily_summary.bets_euro) AS "payout_perce" FROM report.fact_live_daily_summary "fact_live_daily_summary" LEFT JOIN report.dim_countries_info "dim_countries_info" ON (fact_live_daily_summary.product_id = dim_countries_info.product_id and fact_live_daily_summary.country_id = dim_countries_info.country_id) LEFT JOIN report.dim_brand_info "dim_brand_info" ON (dim_brand_info.product_id = fact_live_daily_summary.product_id and fact_live_daily_summary.brand_id = dim_brand_info.brand_id ) LEFT JOIN report.dim_tables "dim_tables" ON (fact_live_daily_summary.table_id_num = dim_tables.table_id_num) LEFT JOIN report.dim_games_info "dim_games_info" ON (fact_live_daily_summary.product_id = dim_games_info.product_id and fact_live_daily_summary.gametype_id = dim_games_info.game_id) LEFT JOIN report.fact_dealer_wise_details "fact_dealer_wise_details" ON (fact_live_daily_summary.product_id = 3 and fact_live_daily_summary.operator_id = fact_dealer_wise_details.user_id) WHERE ((fact_live_daily_summary.summary >= TRUNC(SYSDATE-1) AND fact_live_daily_summary.summary < TRUNC(SYSDATE))) GROUP BY (dim_brand_info.brand), (coalesce(dim_tables.game_type,'No Gametype')), ((case when (dim_tables.table_full_name ='null' or dim_tables.table_full_name is null) then 'No Table' else dim_tables.table_full_name end)), ((case when (dim_tables.table_full_name ='null' or dim_tables.table_full_name is null) then 'No Tablefullname' else dim_tables.table_full_name end)) ORDER BY "sum_bets_euro" DESC LIMIT 100000

EnableUniquenessOptimization is on
The following JGNodes/Tables have been pruned due to uniqueness guarantee:
Node 1 dim_countries_info (report.dim_countries_info)

Access Path:
+-SELECT LIMIT 100K [Cost: 42M, Rows: 100K] (PATH ID: 0)
| Output Only: 100000 tuples
| Execute on: Query Initiator
| +---> SORT [TOPK] [Cost: 42M, Rows: 38M] (PATH ID: 1)
| | Order: sum(fact_live_daily_summary.bets_euro) DESC
| | Output Only: 100000 tuples
| | Execute on: All Nodes
| | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 37M, Rows: 38M] (PATH ID: 2)
| | | Aggregates: count(DISTINCT fact_live_daily_summary.game_id), sum(), sum(), sum(), sum(), count(DISTINCT fact_live_daily_summary.player_id)
| | | Group By: dim_brand_info.brand, , ,
| | | Execute on: All Nodes
| | | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 28M, Rows: 38M] (PATH ID: 3)
| | | | Aggregates: sum(fact_live_daily_summary.ggr_euro), sum(fact_live_daily_summary.bets_euro), sum(fact_live_daily_summary.bet_count), sum(fact_live_daily_summary.wins_euro)
| | | | Group By: dim_brand_info.brand, coalesce(dim_tables.game_type, 'No Gametype'), CASE WHEN ((dim_tables.table_full_name = 'null') OR (dim_tables.table_full_name IS NULL)) THEN 'No Table' ELSE dim_tables.table_full_name END, CASE WHEN ((dim_tables.table_full_name = 'null') OR (dim_tables.table_full_name IS NULL)) THEN 'No Tablefullname' ELSE dim_tables.table_full_name END, fact_live_daily_summary.game_id, fact_live_daily_summary.player_id
| | | | Grouping Sets: (dim_brand_info.brand, , , , fact_live_daily_summary.game_id, , , , ), (dim_brand_info.brand, , , , fact_live_daily_summary.player_id)
| | | | Execute on: All Nodes
| | | | +---> JOIN HASH [LeftOuter] [Cost: 17M, Rows: 38M] (PATH ID: 4) Inner (BROADCAST)
| | | | | Join Cond: (dim_brand_info.product_id = fact_live_daily_summary.product_id) AND (fact_live_daily_summary.brand_id = dim_brand_info.brand_id)
| | | | | Materialize at Input: fact_live_daily_summary.product_id, fact_live_daily_summary.brand_id
| | | | | Materialize at Output: fact_live_daily_summary.player_id, fact_live_daily_summary.bet_count, fact_live_daily_summary.bets_euro, fact_live_daily_summary.wins_euro, fact_live_daily_summary.ggr_euro, fact_live_daily_summary.game_id
| | | | | Execute on: All Nodes
| | | | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 17M, Rows: 38M] (PATH ID: 5) Inner (BROADCAST)
| | | | | | Join Cond: (fact_live_daily_summary.operator_id = fact_dealer_wise_details.user_id)
| | | | | | Join Filter: (fact_live_daily_summary.product_id = 3)
| | | | | | Materialize at Input: fact_live_daily_summary.product_id, fact_live_daily_summary.operator_id
| | | | | | Execute on: All Nodes
| | | | | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 17M, Rows: 38M] (PATH ID: 6) Inner (BROADCAST)
| | | | | | | Join Cond: (fact_live_daily_summary.product_id = dim_games_info.product_id) AND (fact_live_daily_summary.gametype_id = dim_games_info.game_id)
| | | | | | | Materialize at Input: fact_live_daily_summary.product_id, fact_live_daily_summary.gametype_id
| | | | | | | Execute on: All Nodes
| | | | | | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 17M, Rows: 38M] (PATH ID: 7) Outer (FILTER)
| | | | | | | | Join Cond: (fact_live_daily_summary.table_id_num = dim_tables.table_id_num)
| | | | | | | | Execute on: All Nodes
| | | | | | | | +-- Outer -> STORAGE ACCESS for fact_live_daily_summary [Cost: 17M, Rows: 38M] (PATH ID: 8)
| | | | | | | | | Projection: report.fact_live_daily_summary_DBD_48_rep_acubi
| | | | | | | | | Materialize: fact_live_daily_summary.table_id_num
| | | | | | | | | Filter: ((fact_live_daily_summary.summary >= '2023-05-08 00:00:00'::timestamp) AND (fact_live_daily_summary.summary < '2023-05-09 00:00:00'::timestamp))
| | | | | | | | | Execute on: All Nodes
| | | | | | | | +-- Inner -> STORAGE ACCESS for dim_tables [Cost: 49, Rows: 216] (PATH ID: 9)
| | | | | | | | | Projection: report.dim_tables_b0
| | | | | | | | | Materialize: dim_tables.table_id_num, dim_tables.table_full_name, dim_tables.game_type
| | | | | | | | | Execute on: All Nodes
| | | | | | | +-- Inner -> STORAGE ACCESS for dim_games_info [Cost: 380, Rows: 228K] (PATH ID: 10)
| | | | | | | | Projection: report.dim_games_info_super
| | | | | | | | Materialize: dim_games_info.product_id, dim_games_info.game_id
| | | | | | | | Execute on: All Nodes
| | | | | | +-- Inner -> STORAGE ACCESS for fact_dealer_wise_details [Cost: 18, Rows: 4K] (PATH ID: 11)
| | | | | | | Projection: report.fact_dealer_wise_details_super
| | | | | | | Materialize: fact_dealer_wise_details.user_id
| | | | | | | Execute on: All Nodes
| | | | | +-- Inner -> STORAGE ACCESS for dim_brand_info [Cost: 202, Rows: 29K] (PATH ID: 12)
| | | | | | Projection: report.dim_brand_info_b0
| | | | | | Materialize: dim_brand_info.product_id, dim_brand_info.brand_id, dim_brand_info.brand
| | | | | | Execute on: All Nodes

Answers

  • Those "BROADCAST" operations are going to kill your performance. You need to think about how these tables are segmented across the nodes. Go read this blog: https://www.vertica.com/blog/beware-segmentation-islands/ and then think about how to better segment these tables (or, in the case of smaller, dimension tables, or lookup tables, you're better of choosing to UNSEGMENT those instead), and then alter the projection definition to minimize or remove these kinds of operations.

    Also, in lieu of all that, you might consider building this out as a Flattened Table. A Flattened Table is a kind of Vertica construct that collapses all the dimensions into a base Fact table so that you don't have to do the joins at query time, they are performed at LOAD time. Then, the FACT table becomes a kind of automatically generated denormalized table that can minimize or eliminate joins. A flattened table would likely improve this query performance by a lot. You can learn more about them here: https://docs.vertica.com/12.0.x/en/data-analysis/flattened-tables/creating-flattened-tables/

  • VValdarVValdar Vertica Employee Employee

    You should also profile your query to see where the time is spent.
    And also get rid of some of the outer joins table you're using in the from but nowhere else.
    Posting DDLs would help too!.

  • moshegmosheg Vertica Employee Administrator

    In addition, to profile a query you can use the qprof script provided by Maurizio Felici available here:
    https://github.com/mfelici/qprof

    Run the qprof shell script this way:

    ./qprof.sh -f YourQuery.sql -o OutputFileName.out
    
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file