good projection is not used when Join is added

Alexander_Dubit · July 2013

Hi, This query select a11.cell_id AS cell_id, sum(a11.bytes_uplink) AS WJXBFS1, sum(a11.bytes_downlink) AS WJXBFS2, count(distinct a11.imsi) AS WJXBFS3, (sum(a11.bytes_downlink) + sum(a11.bytes_uplink)) AS WJXBFS4, sum(a11.packets_lost_uplink) AS WJXBFS5, sum(a11.packets_uplink) AS WJXBFS6, sum(a11.packets_lost_downlink) AS WJXBFS7, sum(a11.active_duration_time_downlink) AS WJXBFS8, sum(a11.packets_downlink) AS WJXBFS9, sum(a11.retransmitted_packets_uplink) AS WJXBFSa, sum(a11.active_duration_time_uplink) AS WJXBFSb, sum(a11.retransmitted_packets_downlink) AS WJXBFSc from Gn_Group_Fact a11 where to_timestamp(a11.datetime_id) between '2013-05-01 12:00:00' and '2013-05-31 12:00:00' group by a11.cell_id uses -GROUPBY PIPELINED for both the aggregates and the distinct count Access Path: +-GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 335K, Rows: 199] (PATH ID: 1) | Aggregates: count(DISTINCT a11.imsi), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum() | Group By: a11.cell_id | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 333K, Rows: 58K] (PATH ID: 2) | | Aggregates: sum(a11.bytes_uplink), sum(a11.bytes_downlink), sum(a11.packets_lost_uplink), sum(a11.packets_uplink), sum(a11.packets_lost_downlink), sum(a11.active_duration_time_downlink), sum(a11.packets_downlink), sum (a11.retransmitted_packets_uplink), sum(a11.active_duration_time_uplink), sum(a11.retransmitted_packets_downlink) | | Group By: a11.cell_id, a11.imsi | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 304K, Rows: 32M] (PATH ID: 3) | | | Projection: Gn_Group_Fact_Alex_1_b0 | | | Materialize: a11.cell_id, a11.imsi, a11.bytes_downlink, a11.bytes_uplink, a11.packets_downlink, a11.packets_uplink, a11.active_duration_time_downlink, a11.active_duration_time_uplink, a11.retransmitted_packets_downlink, a11.retransmitted_packets_uplink, a11.packets_lost_downlink, a11.packets_lost_uplink | | | Filter: (((to_timestamptz((a11.datetime_id)::float))::timestamp >= '2013-05-01 12:00:00'::timestamp) AND ((to_timestamptz((a11.datetime_id)::float))::timestamp <= '2013-05-31 12:00:00'::timestamp)) | | | Execute on: All Nodes if I add a join to a temporary table to filter by cell_id select a11.cell_id AS cell_id, sum(a11.bytes_uplink) AS WJXBFS1, sum(a11.bytes_downlink) AS WJXBFS2, count(distinct a11.imsi) AS WJXBFS3, (sum(a11.bytes_downlink) + sum(a11.bytes_uplink)) AS WJXBFS4, sum(a11.packets_lost_uplink) AS WJXBFS5, sum(a11.packets_uplink) AS WJXBFS6, sum(a11.packets_lost_downlink) AS WJXBFS7, sum(a11.active_duration_time_downlink) AS WJXBFS8, sum(a11.packets_downlink) AS WJXBFS9, sum(a11.retransmitted_packets_uplink) AS WJXBFSa, sum(a11.active_duration_time_uplink) AS WJXBFSb, sum(a11.retransmitted_packets_downlink) AS WJXBFSc from Gn_Group_Fact a11 join TXURMI16VMQ001 pa12 on (a11.cell_id = pa12.cell_id) where to_timestamp(a11.datetime_id) between '2013-05-01 12:00:00' and '2013-05-31 12:00:00' group by a11.cell_id it uses GROUPBY HASH +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 238K, Rows: 199 (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.imsi), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum() | Group By: a11.cell_id | Execute on: All Nodes | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 237K, Rows: 58K (NO STATISTICS)] (PATH ID: 2) | | Aggregates: sum(a11.bytes_uplink), sum(a11.bytes_downlink), sum(a11.packets_lost_uplink), sum(a11.packets_uplink), sum(a11.packets_lost_downlink), sum(a11.active_duration_time_downlink), sum(a11.packets_downlink), sum (a11.retransmitted_packets_uplink), sum(a11.active_duration_time_uplink), sum(a11.retransmitted_packets_downlink) | | Group By: a11.cell_id, a11.imsi | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 234K, Rows: 32M (40K RLE) (NO STATISTICS)] (PATH ID: 3) | | | Join Cond: (a11.cell_id = pa12.cell_id) | | | Materialize at Output: a11.imsi, a11.bytes_downlink, a11.bytes_uplink, a11.packets_downlink, a11.packets_uplink, a11.active_duration_time_downlink, a11.active_duration_time_uplink, a11.retransmitted_packets_downlink, a11.retransmitted_packets_uplink, a11.packets_lost_downlink, a11.packets_lost_uplink | | | Execute on: All Nodes | | | +-- Outer -> STORAGE ACCESS for a11 [Cost: 490, Rows: 32M (40K RLE)] (PATH ID: 4) | | | | Projection: Gn_Group_Fact_DBD_16_seg_DAC_noTemp_b0 | | | | Materialize: a11.cell_id | | | | Filter: (((to_timestamptz((a11.datetime_id)::float))::timestamp >= '2013-05-01 12:00:00'::timestamp) AND ((to_timestamptz((a11.datetime_id)::float))::timestamp <= '2013-05-31 12:00:00'::timestamp)) | | | | Execute on: All Nodes | | | | Runtime Filter: (SIP1(HashJoin): a11.cell_id) | | | +-- Inner -> STORAGE ACCESS for pa12 [Cost: 604, Rows: 10K (NO STATISTICS)] (PATH ID: 5) | | | | Projection: v_temp_schema.TXURMI16VMQ001_node0002 | | | | Materialize: pa12.cell_id | | | | Execute on: All Nodes it also uses GROUPBY HASH if the join is substituted by a subquery select a11.cell_id AS cell_id, sum(a11.bytes_uplink) AS WJXBFS1, sum(a11.bytes_downlink) AS WJXBFS2, count(distinct a11.imsi) AS WJXBFS3, (sum(a11.bytes_downlink) + sum(a11.bytes_uplink)) AS WJXBFS4, sum(a11.packets_lost_uplink) AS WJXBFS5, sum(a11.packets_uplink) AS WJXBFS6, sum(a11.packets_lost_downlink) AS WJXBFS7, sum(a11.active_duration_time_downlink) AS WJXBFS8, sum(a11.packets_downlink) AS WJXBFS9, sum(a11.retransmitted_packets_uplink) AS WJXBFSa, sum(a11.active_duration_time_uplink) AS WJXBFSb, sum(a11.retransmitted_packets_downlink) AS WJXBFSc from Gn_Group_Fact a11 where to_timestamp(a11.datetime_id) between '2013-05-01 12:00:00' and '2013-05-31 12:00:00' and a11.cell_id in (select cell_id from TXURMI16VMQ001) group by a11.cell_id Is there a way to overcome the latter projection choice and force Vertica to use same projection as for the first execution plan? Thanks, Alex.

[Deleted User] · October 2013

Hi Alex, I would probably start by confirming that the other projection would actually be an improvement. To do that you can use the name of the projection directly in your query and compare performance. Give that a try first and let me know what you find out.

good projection is not used when Join is added

Comments

Leave a Comment