good projection is not used when Join is added
Hi, This query select a11.cell_id AS cell_id, sum(a11.bytes_uplink) AS WJXBFS1, sum(a11.bytes_downlink) AS WJXBFS2, count(distinct a11.imsi) AS WJXBFS3, (sum(a11.bytes_downlink) + sum(a11.bytes_uplink)) AS WJXBFS4, sum(a11.packets_lost_uplink) AS WJXBFS5, sum(a11.packets_uplink) AS WJXBFS6, sum(a11.packets_lost_downlink) AS WJXBFS7, sum(a11.active_duration_time_downlink) AS WJXBFS8, sum(a11.packets_downlink) AS WJXBFS9, sum(a11.retransmitted_packets_uplink) AS WJXBFSa, sum(a11.active_duration_time_uplink) AS WJXBFSb, sum(a11.retransmitted_packets_downlink) AS WJXBFSc from Gn_Group_Fact a11 where to_timestamp(a11.datetime_id) between '2013-05-01 12:00:00' and '2013-05-31 12:00:00' group by a11.cell_id uses -GROUPBY PIPELINED for both the aggregates and the distinct count Access Path: +-GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 335K, Rows: 199] (PATH ID: 1) | Aggregates: count(DISTINCT a11.imsi), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum() | Group By: a11.cell_id | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 333K, Rows: 58K] (PATH ID: 2) | | Aggregates: sum(a11.bytes_uplink), sum(a11.bytes_downlink), sum(a11.packets_lost_uplink), sum(a11.packets_uplink), sum(a11.packets_lost_downlink), sum(a11.active_duration_time_downlink), sum(a11.packets_downlink), sum (a11.retransmitted_packets_uplink), sum(a11.active_duration_time_uplink), sum(a11.retransmitted_packets_downlink) | | Group By: a11.cell_id, a11.imsi | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 304K, Rows: 32M] (PATH ID: 3) | | | Projection: Gn_Group_Fact_Alex_1_b0 | | | Materialize: a11.cell_id, a11.imsi, a11.bytes_downlink, a11.bytes_uplink, a11.packets_downlink, a11.packets_uplink, a11.active_duration_time_downlink, a11.active_duration_time_uplink, a11.retransmitted_packets_downlink, a11.retransmitted_packets_uplink, a11.packets_lost_downlink, a11.packets_lost_uplink | | | Filter: (((to_timestamptz((a11.datetime_id)::float))::timestamp >= '2013-05-01 12:00:00'::timestamp) AND ((to_timestamptz((a11.datetime_id)::float))::timestamp <= '2013-05-31 12:00:00'::timestamp)) | | | Execute on: All Nodes if I add a join to a temporary table to filter by cell_id select a11.cell_id AS cell_id, sum(a11.bytes_uplink) AS WJXBFS1, sum(a11.bytes_downlink) AS WJXBFS2, count(distinct a11.imsi) AS WJXBFS3, (sum(a11.bytes_downlink) + sum(a11.bytes_uplink)) AS WJXBFS4, sum(a11.packets_lost_uplink) AS WJXBFS5, sum(a11.packets_uplink) AS WJXBFS6, sum(a11.packets_lost_downlink) AS WJXBFS7, sum(a11.active_duration_time_downlink) AS WJXBFS8, sum(a11.packets_downlink) AS WJXBFS9, sum(a11.retransmitted_packets_uplink) AS WJXBFSa, sum(a11.active_duration_time_uplink) AS WJXBFSb, sum(a11.retransmitted_packets_downlink) AS WJXBFSc from Gn_Group_Fact a11 join TXURMI16VMQ001 pa12 on (a11.cell_id = pa12.cell_id) where to_timestamp(a11.datetime_id) between '2013-05-01 12:00:00' and '2013-05-31 12:00:00' group by a11.cell_id it uses GROUPBY HASH +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 238K, Rows: 199 (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.imsi), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum(), sum() | Group By: a11.cell_id | Execute on: All Nodes | +---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 237K, Rows: 58K (NO STATISTICS)] (PATH ID: 2) | | Aggregates: sum(a11.bytes_uplink), sum(a11.bytes_downlink), sum(a11.packets_lost_uplink), sum(a11.packets_uplink), sum(a11.packets_lost_downlink), sum(a11.active_duration_time_downlink), sum(a11.packets_downlink), sum (a11.retransmitted_packets_uplink), sum(a11.active_duration_time_uplink), sum(a11.retransmitted_packets_downlink) | | Group By: a11.cell_id, a11.imsi | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 234K, Rows: 32M (40K RLE) (NO STATISTICS)] (PATH ID: 3) | | | Join Cond: (a11.cell_id = pa12.cell_id) | | | Materialize at Output: a11.imsi, a11.bytes_downlink, a11.bytes_uplink, a11.packets_downlink, a11.packets_uplink, a11.active_duration_time_downlink, a11.active_duration_time_uplink, a11.retransmitted_packets_downlink, a11.retransmitted_packets_uplink, a11.packets_lost_downlink, a11.packets_lost_uplink | | | Execute on: All Nodes | | | +-- Outer -> STORAGE ACCESS for a11 [Cost: 490, Rows: 32M (40K RLE)] (PATH ID: 4) | | | | Projection: Gn_Group_Fact_DBD_16_seg_DAC_noTemp_b0 | | | | Materialize: a11.cell_id | | | | Filter: (((to_timestamptz((a11.datetime_id)::float))::timestamp >= '2013-05-01 12:00:00'::timestamp) AND ((to_timestamptz((a11.datetime_id)::float))::timestamp <= '2013-05-31 12:00:00'::timestamp)) | | | | Execute on: All Nodes | | | | Runtime Filter: (SIP1(HashJoin): a11.cell_id) | | | +-- Inner -> STORAGE ACCESS for pa12 [Cost: 604, Rows: 10K (NO STATISTICS)] (PATH ID: 5) | | | | Projection: v_temp_schema.TXURMI16VMQ001_node0002 | | | | Materialize: pa12.cell_id | | | | Execute on: All Nodes it also uses GROUPBY HASH if the join is substituted by a subquery select a11.cell_id AS cell_id, sum(a11.bytes_uplink) AS WJXBFS1, sum(a11.bytes_downlink) AS WJXBFS2, count(distinct a11.imsi) AS WJXBFS3, (sum(a11.bytes_downlink) + sum(a11.bytes_uplink)) AS WJXBFS4, sum(a11.packets_lost_uplink) AS WJXBFS5, sum(a11.packets_uplink) AS WJXBFS6, sum(a11.packets_lost_downlink) AS WJXBFS7, sum(a11.active_duration_time_downlink) AS WJXBFS8, sum(a11.packets_downlink) AS WJXBFS9, sum(a11.retransmitted_packets_uplink) AS WJXBFSa, sum(a11.active_duration_time_uplink) AS WJXBFSb, sum(a11.retransmitted_packets_downlink) AS WJXBFSc from Gn_Group_Fact a11 where to_timestamp(a11.datetime_id) between '2013-05-01 12:00:00' and '2013-05-31 12:00:00' and a11.cell_id in (select cell_id from TXURMI16VMQ001) group by a11.cell_id Is there a way to overcome the latter projection choice and force Vertica to use same projection as for the first execution plan? Thanks, Alex.
0
Comments