grouping sets - no local resegment groups

Hi,

When I use more than 1 group set in the grouping set clause, the optimizer always do global resegment, but when I place in the clause only 1 grouping set a local resgment take place (meaning both GROUPBYs qualify for local resegment but when I put them together in the grouping set clause a global resegment takes place instead of local).

The global resegment shuffles millions of rows between nodes, and the query runs very long time. Any way to overcome it and do local resegment?

Two grouping sets:
explain Select udid_int, max(revenue) as revenue, product_name, coalesce(max(case when (period <= 6 and ('2018-11-13' - install_date) >= 6) then revenue end), 0) as repart_revenue_d__6_repartend FROM client_applications.vw_fact_retargeting_activity WHERE 1 = 1 AND install_date >= '2018-01-01' AND install_date <= '2018-10-31' AND event_date >= '2018-01-01' AND install_date <= '2018-10-31' AND event_date <= '2018-10-31' Group By grouping sets ((udid_int, event_date, product_name), (udid_int, event_date) )

QUERY PLAN
Access Path:
+-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 13M, Rows: 352] (PATH ID: 1)
| Aggregates: max(vw_fact_retargeting_activity.revenue), max(CASE WHEN ((vw_fact_retargeting_activity.period <= 6) AND (('2018-11-13'::date - vw_fact_retargeting_activity.install_date) >= 6)) THEN vw_fact_retargeting_activity.revenue ELSE NULL::numeric(1,0) END)
| Group By: vw_fact_retargeting_activity.udid_int, vw_fact_retargeting_activity.event_date, vw_fact_retargeting_activity.product_name
| Grouping Sets: (vw_fact_retargeting_activity.udid_int, vw_fact_retargeting_activity.event_date, vw_fact_retargeting_activity.product_name, , ), (vw_fact_retargeting_activity.udid_int, vw_fact_retargeting_activity.event_date, , )
| Execute on: All Nodes
| Execute on: All Nodes
| +---> STORAGE ACCESS for fact_retargeting_activity [Cost: 4M, Rows: 668M] (PATH ID: 3)
| | Projection: dwh.fact_retargeting_activity_udid_int_event_date_super_b0
| | Materialize: fact_retargeting_activity.os_id, fact_retargeting_activity.product_id, fact_retargeting_activity.cohort_date, fact_retargeting_activity.event_date, fact_retargeting_activity.product_name, fact_retargeting_activity.udid_int, fact_retargeting_activity.period, fact_retargeting_activity.revenue
| | Filter: ((fact_retargeting_activity.cohort_date >= '2018-01-01'::date) AND (fact_retargeting_activity.cohort_date <= '2018-10-31'::date))
| | Filter: ((fact_retargeting_activity.event_date >= '2018-01-01'::date) AND (fact_retargeting_activity.event_date <= '2018-10-31'::date))
| | Filter: (fact_retargeting_activity.os_id <> 8)
| | Filter: (fact_retargeting_activity.product_id <> ALL (ARRAY[31]))
| | Filter: ((fact_retargeting_activity.os_id <> 3) OR ((fact_retargeting_activity.os_id = 3) AND (fact_retargeting_activity.product_id = ANY (ARRAY[7, 12, 28]))))
| | Execute on: All Nodes

1st grouping set without the 2nd:
QUERY PLAN
Access Path:
+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 4M, Rows: 352] (PATH ID: 1)
| Aggregates: max(vw_fact_retargeting_activity.revenue), max(CASE WHEN ((vw_fact_retargeting_activity.period <= 6) AND (('2018-11-13'::date - vw_fact_retargeting_activity.install_date) >= 6)) THEN vw_fact_retargeting_activity.revenue ELSE NULL::numeric(1,0) END)
| Group By: vw_fact_retargeting_activity.udid_int, vw_fact_retargeting_activity.event_date, vw_fact_retargeting_activity.product_name
| Execute on: All Nodes
| Execute on: All Nodes
| +---> STORAGE ACCESS for fact_retargeting_activity [Cost: 4M, Rows: 668M] (PATH ID: 3)
| | Projection: dwh.fact_retargeting_activity_udid_int_event_date_super_b0
| | Materialize: fact_retargeting_activity.os_id, fact_retargeting_activity.product_id, fact_retargeting_activity.cohort_date, fact_retargeting_activity.event_date, fact_retargeting_activity.product_name, fact_retargeting_activity.udid_int, fact_retargeting_activity.period, fact_retargeting_activity.revenue
| | Filter: ((fact_retargeting_activity.cohort_date >= '2018-01-01'::date) AND (fact_retargeting_activity.cohort_date <= '2018-10-31'::date))
| | Filter: ((fact_retargeting_activity.event_date >= '2018-01-01'::date) AND (fact_retargeting_activity.event_date <= '2018-10-31'::date))
| | Filter: (fact_retargeting_activity.os_id <> 8)
| | Filter: (fact_retargeting_activity.product_id <> ALL (ARRAY[31]))
| | Filter: ((fact_retargeting_activity.os_id <> 3) OR ((fact_retargeting_activity.os_id = 3) AND (fact_retargeting_activity.product_id = ANY (ARRAY[7, 12, 28]))))
| | Execute on: All Nodes

2nd grouping set without the 1st:
QUERY PLAN
Access Path:
+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 3M, Rows: 352] (PATH ID: 1)
| Aggregates: max(vw_fact_retargeting_activity.revenue), max(CASE WHEN ((vw_fact_retargeting_activity.period <= 6) AND (('2018-11-13'::date - vw_fact_retargeting_activity.install_date) >= 6)) THEN vw_fact_retargeting_activity.revenue ELSE NULL::numeric(1,0) END)
| Group By: vw_fact_retargeting_activity.udid_int, vw_fact_retargeting_activity.event_date
| Execute on: All Nodes
| Execute on: All Nodes
| +---> STORAGE ACCESS for fact_retargeting_activity [Cost: 3M, Rows: 668M] (PATH ID: 3)
| | Projection: dwh.fact_retargeting_activity_udid_int_event_date_super_b0
| | Materialize: fact_retargeting_activity.os_id, fact_retargeting_activity.product_id, fact_retargeting_activity.cohort_date, fact_retargeting_activity.event_date, fact_retargeting_activity.udid_int, fact_retargeting_activity.period, fact_retargeting_activity.revenue
| | Filter: ((fact_retargeting_activity.cohort_date >= '2018-01-01'::date) AND (fact_retargeting_activity.cohort_date <= '2018-10-31'::date))
| | Filter: ((fact_retargeting_activity.event_date >= '2018-01-01'::date) AND (fact_retargeting_activity.event_date <= '2018-10-31'::date))
| | Filter: (fact_retargeting_activity.os_id <> 8)
| | Filter: (fact_retargeting_activity.product_id <> ALL (ARRAY[31]))
| | Filter: ((fact_retargeting_activity.os_id <> 3) OR ((fact_retargeting_activity.os_id = 3) AND (fact_retargeting_activity.product_id = ANY (ARRAY[7, 12, 28]))))
| | Execute on: All Nodes

Comments

  • BTW, vertica version is 8.1.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Each example references the dwh.fact_retargeting_activity_udid_int_event_date_super_b0 projection. What are the columns in its segmentation clause?

    Are there any other tables used by the view client_applications.vw_fact_retargeting_activity?

  • Hi,

    The segmentation columns are: udid_int & event_date.

    The view contains couple of left joins but there arent any columns from those tables in the query.
  • Just to make sure it's not because of the view, I changed the query to use the physical table dwh.fact_retargeting_activity and still got global resegment.

  • edited November 2018

    @Jim_Knicely are there any plans to improve the execution plan so local group by will be used when possible? This feature is not really usable because of the data shuffling (even when I place 2 identical sets I get global resegent).

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file