With clause materialization and query planner oddities?

Hi All, We had some queries which based on explain analyze output seemed to be spending much of their time applying to_char formatting to timestamp columns for each row result. To improve the performance, we tried using the new "with clause" introduced in Vertica 6.1.1. (we have just upgraded in order to make use of the feature) From our understanding, the "with clause" statements should be forcing the materialization of those formatted results, thereby avoiding the cost per row. Our initial testing showed that it yielded huge performance improvements. However, with further testing it seems that the query planner doesn't always make use of the materialized (pre-formatted) values. Here is a case where one query behaves correctly, but modifying it slightly seems to make the query planner fall back to applying the function per row. -- EXPECTED BEHAVIOUR (Group by: lt.hourOfDay) -- explain analyse with local_times as (select tstamp,TO_CHAR(new_time(tstamp, 'UTC', 'America/New_York'), 'HH24') as hourOfDay from (select distinct tstamp from delivery where tstamp between '2013-03-01 05:00:00.000' and '2013-04-01 03:59:59.999') as tstamps) select lt.hourOfDay as hourOfDay, sum(delivery.impressions) as impressions from delivery inner join local_times lt on lt.tstamp = delivery.tstamp left outer join geo_countries on delivery.country = geo_countries.code where delivery.tstamp between to_timestamp('2013-03-01 05:00:00.000', 'yyyy-MM-dd hh24:mi:ss.ff3') and to_timestamp('2013-04-01 03:59:59.999', 'yyyy-MM-dd hh24:mi:ss.ff3') group by hourOfDay; Access Path: +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 149K, Rows: 10K] (PATH ID: 1) | Aggregates: sum(delivery.impressions) | Group By: lt.hourOfDay | Execute on: All Nodes | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 148K, Rows: 124M (743 RLE)] (PATH ID: 2) Inner (BROADCAST) | | Join Cond: (lt.tstamp = delivery.tstamp) | | Materialize at Input: delivery.tstamp | | Materialize at Output: delivery.impressions | | Execute on: All Nodes | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 41K, Rows: 124M (421 RLE)] (PATH ID: 3) | | | Join Cond: (delivery.country = geo_countries.code) | | | Execute on: All Nodes | | | Runtime Filter: (SIP1(MergeJoin): delivery.tstamp) | | | +-- Outer -> STORAGE ACCESS for delivery [Cost: 2K, Rows: 124M (421 RLE)] (PATH ID: 4) | | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | | Materialize: delivery.country | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:5 9.999'::timestamp)) | | | | Execute on: All Nodes | | | +-- Inner -> STORAGE ACCESS for geo_countries [Cost: 161, Rows: 255] (PATH ID: 5) | | | | Projection: rtb_production.geo_countries_DBD_5_rep_rtb_rtb_node0001 | | | | Materialize: geo_countries.code | | | | Execute on: All Nodes | | +-- Inner -> SELECT [Cost: 56, Rows: 743] (PATH ID: 6) | | | Execute on: All Nodes | | | Execute on: All Nodes | | | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 56, Rows: 743] (PATH ID: 8) | | | | Group By: delivery.tstamp | | | | Execute on: All Nodes | | | | +---> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 9) | | | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | | | Materialize: delivery.tstamp | | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59 :59.999'::timestamp)) | | | | | Execute on: All Nodes -- UNEXPECTED BEHAVIOUR (Group by: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24')) -- explain analyse with local_times as (select tstamp,TO_CHAR(new_time(tstamp, 'UTC', 'America/New_York'), 'HH24') as hourOfDay from (select distinct tstamp from delivery where tstamp between '2013-03-01 05:00:00.000' and '2013-04-01 03:59:59.999') as tstamps) select lt.hourOfDay as hourOfDay, sum(delivery.impressions) as impressions from delivery inner join local_times lt on lt.tstamp = delivery.tstamp where delivery.tstamp between to_timestamp('2013-03-01 05:00:00.000', 'yyyy-MM-dd hh24:mi:ss.ff3') and to_timestamp('2013-04-01 03:59:59.999', 'yyyy-MM-dd hh24:mi:ss.ff3') group by hourOfDay; Access Path: +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 83K, Rows: 2] (PATH ID: 1) | Aggregates: sum(delivery.impressions) | Group By: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24') | Execute on: All Nodes | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 83K, Rows: 124M (421 RLE)] (PATH ID: 2) Inner (BROADCAST) | | Join Cond: (tstamps.tstamp = delivery.tstamp) | | Materialize at Output: delivery.impressions | | Execute on: All Nodes | | +-- Outer -> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 3) | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | Materialize: delivery.tstamp | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59. 999'::timestamp)) | | | Execute on: All Nodes | | | Runtime Filter: (SIP1(MergeJoin): delivery.tstamp) | | +-- Inner -> SELECT [Cost: 56, Rows: 743] (PATH ID: 4) | | | Execute on: All Nodes | | | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 56, Rows: 743] (PATH ID: 5) | | | | Group By: delivery.tstamp | | | | Execute on: All Nodes | | | | +---> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 6) | | | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | | | Materialize: delivery.tstamp | | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59 :59.999'::timestamp)) | | | | | Execute on: All Nodes Would anyone have any ideas about what's going on here? (we're a bit baffled) Thanks, Emanuel Pordes

Comments

  • Post got a little garbled/truncated... here's the rest: | | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | | | Execute on: All Nodes -- UNEXPECTED BEHAVIOUR (Group by: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24')) -- explain analyse with local_times as (select tstamp,TO_CHAR(new_time(tstamp, 'UTC', 'America/New_York'), 'HH24') as hourOfDay from (select distinct tstamp from delivery where tstamp between '2013-03-01 05:00:00.000' and '2013-04-01 03:59:59.999') as tstamps) select lt.hourOfDay as hourOfDay, sum(delivery.impressions) as impressions from delivery inner join local_times lt on lt.tstamp = delivery.tstamp where delivery.tstamp between to_timestamp('2013-03-01 05:00:00.000', 'yyyy-MM-dd hh24:mi:ss.ff3') and to_timestamp('2013-04-01 03:59:59.999', 'yyyy-MM-dd hh24:mi:ss.ff3') group by hourOfDay; Access Path: +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 83K, Rows: 2] (PATH ID: 1) | Aggregates: sum(delivery.impressions) | Group By: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24') | Execute on: All Nodes | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 83K, Rows: 124M (421 RLE)] (PATH ID: 2) Inner (BROADCAST) | | Join Cond: (tstamps.tstamp = delivery.tstamp) | | Materialize at Output: delivery.impressions | | Execute on: All Nodes | | +-- Outer -> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 3) | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | Materialize: delivery.tstamp | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | Execute on: All Nodes | | | Runtime Filter: (SIP1(MergeJoin): delivery.tstamp) | | +-- Inner -> SELECT [Cost: 56, Rows: 743] (PATH ID: 4) | | | Execute on: All Nodes | | | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 56, Rows: 743] (PATH ID: 5) | | | | Group By: delivery.tstamp | | | | Execute on: All Nodes | | | | +---> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 6) | | | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | | | Materialize: delivery.tstamp | | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | | | Execute on: All Nodes Would anyone have any ideas about what's going on here? (we're a bit baffled) Thanks, Emanuel Pordes
  • Again, garbled/truncated... trying again with plain text: Hi All, We had some queries which based on explain analyze output seemed to be spending much of their time applying to_char formatting to timestamp columns for each row result. To improve the performance, we tried using the new "with clause" introduced in Vertica 6.1.1. (we have just upgraded in order to make use of the feature) From our understanding, the "with clause" statements should be forcing the materialization of those formatted results, thereby avoiding the cost per row. Our initial testing showed that it yielded huge performance improvements. However, with further testing it seems that the query planner doesn't always make use of the materialized (pre-formatted) values. Here is a case where one query behaves correctly, but modifying it slightly seems to make the query planner fall back to applying the function per row. -- EXPECTED BEHAVIOUR (Group by: lt.hourOfDay) -- explain analyse with local_times as (select tstamp,TO_CHAR(new_time(tstamp, 'UTC', 'America/New_York'), 'HH24') as hourOfDay from (select distinct tstamp from delivery where tstamp between '2013-03-01 05:00:00.000' and '2013-04-01 03:59:59.999') as tstamps) select lt.hourOfDay as hourOfDay, sum(delivery.impressions) as impressions from delivery inner join local_times lt on lt.tstamp = delivery.tstamp left outer join geo_countries on delivery.country = geo_countries.code where delivery.tstamp between to_timestamp('2013-03-01 05:00:00.000', 'yyyy-MM-dd hh24:mi:ss.ff3') and to_timestamp('2013-04-01 03:59:59.999', 'yyyy-MM-dd hh24:mi:ss.ff3') group by hourOfDay; Access Path: +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 149K, Rows: 10K] (PATH ID: 1) | Aggregates: sum(delivery.impressions) | Group By: lt.hourOfDay | Execute on: All Nodes | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 148K, Rows: 124M (743 RLE)] (PATH ID: 2) Inner (BROADCAST) | | Join Cond: (lt.tstamp = delivery.tstamp) | | Materialize at Input: delivery.tstamp | | Materialize at Output: delivery.impressions | | Execute on: All Nodes | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 41K, Rows: 124M (421 RLE)] (PATH ID: 3) | | | Join Cond: (delivery.country = geo_countries.code) | | | Execute on: All Nodes | | | Runtime Filter: (SIP1(MergeJoin): delivery.tstamp) | | | +-- Outer -> STORAGE ACCESS for delivery [Cost: 2K, Rows: 124M (421 RLE)] (PATH ID: 4) | | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | | Materialize: delivery.country | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | | Execute on: All Nodes | | | +-- Inner -> STORAGE ACCESS for geo_countries [Cost: 161, Rows: 255] (PATH ID: 5) | | | | Projection: rtb_production.geo_countries_DBD_5_rep_rtb_rtb_node0001 | | | | Materialize: geo_countries.code | | | | Execute on: All Nodes | | +-- Inner -> SELECT [Cost: 56, Rows: 743] (PATH ID: 6) | | | Execute on: All Nodes | | | Execute on: All Nodes | | | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 56, Rows: 743] (PATH ID: 8) | | | | Group By: delivery.tstamp | | | | Execute on: All Nodes | | | | +---> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 9) | | | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | | | Materialize: delivery.tstamp | | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | | | Execute on: All Nodes -- UNEXPECTED BEHAVIOUR (Group by: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24')) -- explain analyse with local_times as (select tstamp,TO_CHAR(new_time(tstamp, 'UTC', 'America/New_York'), 'HH24') as hourOfDay from (select distinct tstamp from delivery where tstamp between '2013-03-01 05:00:00.000' and '2013-04-01 03:59:59.999') as tstamps) select lt.hourOfDay as hourOfDay, sum(delivery.impressions) as impressions from delivery inner join local_times lt on lt.tstamp = delivery.tstamp where delivery.tstamp between to_timestamp('2013-03-01 05:00:00.000', 'yyyy-MM-dd hh24:mi:ss.ff3') and to_timestamp('2013-04-01 03:59:59.999', 'yyyy-MM-dd hh24:mi:ss.ff3') group by hourOfDay; Access Path: +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 83K, Rows: 2] (PATH ID: 1) | Aggregates: sum(delivery.impressions) | Group By: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24') | Execute on: All Nodes | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 83K, Rows: 124M (421 RLE)] (PATH ID: 2) Inner (BROADCAST) | | Join Cond: (tstamps.tstamp = delivery.tstamp) | | Materialize at Output: delivery.impressions | | Execute on: All Nodes | | +-- Outer -> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 3) | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | Materialize: delivery.tstamp | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | Execute on: All Nodes | | | Runtime Filter: (SIP1(MergeJoin): delivery.tstamp) | | +-- Inner -> SELECT [Cost: 56, Rows: 743] (PATH ID: 4) | | | Execute on: All Nodes | | | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 56, Rows: 743] (PATH ID: 5) | | | | Group By: delivery.tstamp | | | | Execute on: All Nodes | | | | +---> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 6) | | | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | | | Materialize: delivery.tstamp | | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | | | Execute on: All Nodes Would anyone have any ideas about what's going on here? (we're a bit baffled) Thanks, Emanuel Pordes
  • | | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | | | Execute on: All Nodes -- UNEXPECTED BEHAVIOUR (Group by: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24')) -- explain analyse with local_times as (select tstamp,TO_CHAR(new_time(tstamp, 'UTC', 'America/New_York'), 'HH24') as hourOfDay from (select distinct tstamp from delivery where tstamp between '2013-03-01 05:00:00.000' and '2013-04-01 03:59:59.999') as tstamps) select lt.hourOfDay as hourOfDay, sum(delivery.impressions) as impressions from delivery inner join local_times lt on lt.tstamp = delivery.tstamp where delivery.tstamp between to_timestamp('2013-03-01 05:00:00.000', 'yyyy-MM-dd hh24:mi:ss.ff3') and to_timestamp('2013-04-01 03:59:59.999', 'yyyy-MM-dd hh24:mi:ss.ff3') group by hourOfDay; Access Path: +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 83K, Rows: 2] (PATH ID: 1) | Aggregates: sum(delivery.impressions) | Group By: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24') | Execute on: All Nodes | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 83K, Rows: 124M (421 RLE)] (PATH ID: 2) Inner (BROADCAST) | | Join Cond: (tstamps.tstamp = delivery.tstamp) | | Materialize at Output: delivery.impressions | | Execute on: All Nodes | | +-- Outer -> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 3) | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | Materialize: delivery.tstamp | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | Execute on: All Nodes | | | Runtime Filter: (SIP1(MergeJoin): delivery.tstamp) | | +-- Inner -> SELECT [Cost: 56, Rows: 743] (PATH ID: 4) | | | Execute on: All Nodes | | | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 56, Rows: 743] (PATH ID: 5) | | | | Group By: delivery.tstamp | | | | Execute on: All Nodes | | | | +---> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 6) | | | | | Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0 | | | | | Materialize: delivery.tstamp | | | | | Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp)) | | | | | Execute on: All Nodes
  • Hi All, We had some queries which based on explain analyze output seemed to be spending much of their time applying to_char formatting to timestamp columns for each row result. To improve the performance, we tried using the new "with clause" introduced in Vertica 6.1.1. (we have just upgraded in order to make use of the feature) From our understanding, the "with clause" statements should be forcing the materialization of those formatted results, thereby avoiding the cost per row. Our initial testing showed that it yielded huge performance improvements. However, with further testing it seems that the query planner doesn't always make use of the materialized (pre-formatted) values. Here is a case where one query behaves correctly, but modifying it slightly seems to make the query planner fall back to applying the function per row.
      -- EXPECTED BEHAVIOUR (Group by: lt.hourOfDay)  --  explain analyse with local_times as (select tstamp,TO_CHAR(new_time(tstamp, 'UTC', 'America/New_York'), 'HH24') as hourOfDay from (select distinct tstamp from delivery where tstamp between '2013-03-01 05:00:00.000' and '2013-04-01 03:59:59.999') as tstamps) select lt.hourOfDay as hourOfDay, sum(delivery.impressions) as impressions from delivery inner join local_times lt on lt.tstamp = delivery.tstamp left outer join geo_countries on delivery.country = geo_countries.code where delivery.tstamp between to_timestamp('2013-03-01 05:00:00.000', 'yyyy-MM-dd hh24:mi:ss.ff3') and to_timestamp('2013-04-01 03:59:59.999', 'yyyy-MM-dd hh24:mi:ss.ff3') group by hourOfDay;    Access Path:   +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 149K, Rows: 10K] (PATH ID: 1)   |  Aggregates: sum(delivery.impressions)   |  Group By: lt.hourOfDay   |  Execute on: All Nodes   | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 148K, Rows: 124M (743 RLE)] (PATH ID: 2) Inner (BROADCAST)   | |      Join Cond: (lt.tstamp = delivery.tstamp)   | |      Materialize at Input: delivery.tstamp   | |      Materialize at Output: delivery.impressions   | |      Execute on: All Nodes   | | +-- Outer -> JOIN HASH [LeftOuter] [Cost: 41K, Rows: 124M (421 RLE)] (PATH ID: 3)   | | |      Join Cond: (delivery.country = geo_countries.code)   | | |      Execute on: All Nodes   | | |      Runtime Filter: (SIP1(MergeJoin): delivery.tstamp)   | | | +-- Outer -> STORAGE ACCESS for delivery [Cost: 2K, Rows: 124M (421 RLE)] (PATH ID: 4)   | | | |      Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0   | | | |      Materialize: delivery.country   | | | |      Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp))   | | | |      Execute on: All Nodes   | | | +-- Inner -> STORAGE ACCESS for geo_countries [Cost: 161, Rows: 255] (PATH ID: 5)   | | | |      Projection: rtb_production.geo_countries_DBD_5_rep_rtb_rtb_node0001   | | | |      Materialize: geo_countries.code   | | | |      Execute on: All Nodes   | | +-- Inner -> SELECT [Cost: 56, Rows: 743] (PATH ID: 6)   | | |      Execute on: All Nodes   | | |      Execute on: All Nodes   | | | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 56, Rows: 743] (PATH ID: 8)   | | | |      Group By: delivery.tstamp   | | | |      Execute on: All Nodes   | | | | +---> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 9)   | | | | |      Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0   | | | | |      Materialize: delivery.tstamp   | | | | |      Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp))   | | | | |      Execute on: All Nodes        -- UNEXPECTED BEHAVIOUR (Group by: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24'))  --  explain analyse with local_times as (select tstamp,TO_CHAR(new_time(tstamp, 'UTC', 'America/New_York'), 'HH24') as hourOfDay from (select distinct tstamp from delivery where tstamp between '2013-03-01 05:00:00.000' and '2013-04-01 03:59:59.999') as tstamps) select lt.hourOfDay as hourOfDay, sum(delivery.impressions) as impressions from delivery inner join local_times lt on lt.tstamp = delivery.tstamp where delivery.tstamp between to_timestamp('2013-03-01 05:00:00.000', 'yyyy-MM-dd hh24:mi:ss.ff3') and to_timestamp('2013-04-01 03:59:59.999', 'yyyy-MM-dd hh24:mi:ss.ff3') group by hourOfDay;    Access Path:   +-GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 83K, Rows: 2] (PATH ID: 1)   |  Aggregates: sum(delivery.impressions)   |  Group By: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24')   |  Execute on: All Nodes   | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 83K, Rows: 124M (421 RLE)] (PATH ID: 2) Inner (BROADCAST)   | |      Join Cond: (tstamps.tstamp = delivery.tstamp)   | |      Materialize at Output: delivery.impressions   | |      Execute on: All Nodes   | | +-- Outer -> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 3)   | | |      Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0   | | |      Materialize: delivery.tstamp   | | |      Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp))   | | |      Execute on: All Nodes   | | |      Runtime Filter: (SIP1(MergeJoin): delivery.tstamp)   | | +-- Inner -> SELECT [Cost: 56, Rows: 743] (PATH ID: 4)   | | |      Execute on: All Nodes   | | | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 56, Rows: 743] (PATH ID: 5)   | | | |      Group By: delivery.tstamp   | | | |      Execute on: All Nodes   | | | | +---> STORAGE ACCESS for delivery [Cost: 53, Rows: 124M (421 RLE)] (PATH ID: 6)   | | | | |      Projection: rtb_production.delivery_DBD_1_seg_local_times2_b0   | | | | |      Materialize: delivery.tstamp   | | | | |      Filter: ((delivery.tstamp >= '2013-03-01 05:00:00'::timestamp) AND (delivery.tstamp <= '2013-04-01 03:59:59.999'::timestamp))   | | | | |      Execute on: All Nodes  
    Would anyone have any ideas about what's going on here? (we're a bit baffled) Thanks, Emanuel Pordes
  • Sorry for the wall of text above... hopefully this can help focus the question. What we really want to know is why does one query do this? Group By: lt.hourOfDay and the other do this? Group By: to_char("timezone"('America/New_York', "timezone"('UTC', tstamps.tstamp)), 'HH24') We're expecting to see Group By: lt.hourOfDay in both queries.
  • Do you finaly able to understend when vertica materialized  the conetnt of the WITH content and when not ?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file