The explain for upper and non-Upperb are different and execution times vary widely

jasonhe · June 2020

hi guys,
In vertica 9.2, Which I use now, I encounter the following problem. When UPPERB is used, the execution result is completed in 3s, but I don't use UPPERB, which needs 50s to complete. Here are SQL and EXPLAIN

use UPPERB (Take 3 s)
select sum( ext_reseller_price) sourceExtRsPrice ,sum(ext_eu_price) sourceExtCustomerPrice ,sum(ext_msrp) sourceExtMsrp from dw_us.dws_stellr_consumption_azure_usage_1d consume where date_flag between to_date(to_char(current_date, 'YYYY-MM-01'), 'YYYY-MM-DD') and current_date and UPPERB(consume.resource_uri) in ( select UPPERB(resource_uri) from dm_us.dm_stellr_azure_filter_view fv where 1 = 1 and reseller_no=123456 group by UPPERB(resource_uri) )
+-GROUPBY NOTHING [Cost: 172M, Rows: 1] (PATH ID: 1)
Aggregates: sum(consume.ext_reseller_price), sum(consume.ext_eu_price), sum(consume.ext_msrp) Execute on: All Nodes
+---> JOIN HASH [Semi] [Cost: 171M, Rows: 590K] (PATH ID: 2) Inner (BROADCAST)
Join Cond: (upperb(consume.resource_uri) = VAL(2)) Materialize at Output: consume.ext_msrp, consume.ext_reseller_price, consume.ext_eu_price Execute on: All Nodes
+-- Outer -> STORAGE ACCESS for consume [Cost: 13M, Rows: 1M] (PATH ID: 3)
Projection: dw_us.dws_stellr_consumption_azure_usage_1d_DBD_1_seg_test2_b0 Materialize: consume.resource_uri Filter: ((consume.date_flag >= '2020-06-01'::date) AND (consume.date_flag <= '2020-06-26'::date)) Execute on: All Nodes Runtime Filter: (SIP1(HashJoin): upperb(consume.resource_uri))
+---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 41M, Rows: 700K] (PATH ID: 5)
Group By: upperb(fv.resource_uri) Execute on: All Nodes
+---> STORAGE ACCESS for fv [Cost: 2M, Rows: 700K] (PATH ID: 6)
Projection: dm_us.dm_stellr_azure_filter_view_DBD_2_seg_test2_b0 Materialize: fv.resource_uri Filter: (fv.reseller_no = 123456) Execute on: All Nodes

no use UPPERB(Take 50s)
select sum( ext_reseller_price) sourceExtRsPrice
,sum(ext_eu_price) sourceExtCustomerPrice
,sum(ext_msrp) sourceExtMsrp
from dw_us.dws_stellr_consumption_azure_usage_1d consume
where date_flag between to_date(to_char(current_date, 'YYYY-MM-01'), 'YYYY-MM-DD') and current_date
and consume.resource_uri in (
select resource_uri
from dm_us.dm_stellr_azure_filter_view fv
where 1 = 1 and reseller_no=123456
group by resource_uri
)
+-GROUPBY NOTHING [Cost: 144M, Rows: 1] (PATH ID: 1)
Aggregates: sum(), sum(), sum() Execute on: All Nodes
+---> JOIN HASH [Semi] [Cost: 143M, Rows: 1M] (PATH ID: 2) Outer (RESEGMENT)(LOCAL ROUND ROBIN)
Join Cond: (consume.resource_uri = VAL(2)) Execute on: All Nodes
+-- Outer -> STORAGE ACCESS for consume [Cost: 13M, Rows: 1M] (PUSHED GROUPING) (PATH ID: 3)
Projection: dw_us.dws_stellr_consumption_azure_usage_1d_DBD_1_seg_test2_b0 Materialize: consume.ext_eu_price, consume.resource_uri, consume.ext_msrp, consume.ext_reseller_price Filter: ((consume.date_flag >= '2020-06-01'::date) AND (consume.date_flag <= '2020-06-26'::date)) Execute on: All Nodes
+---> GROUPBY HASH (GLOBAL RESEGMENT GROUPS) (LOCAL RESEGMENT GROUPS) [Cost: 41M, Rows: 700K] (PATH ID: 5)
Group By: fv.resource_uri Execute on: All Nodes
+---> STORAGE ACCESS for fv [Cost: 2M, Rows: 700K] (PATH ID: 6)
Projection: dm_us.dm_stellr_azure_filter_view_DBD_2_seg_test2_b0 Materialize: fv.resource_uri Filter: (fv.reseller_no = 123456) Execute on: All Nodes

Hibiki · July 2020

I think the following differences affected performance.

[use UPPERB]
+---> JOIN HASH [Semi] [Cost: 171M, Rows: 590K] (PATH ID: 2) Inner (BROADCAST)
Materialize at Output: consume.ext_msrp, consume.ext_reseller_price, consume.ext_eu_price
+-- Outer -> STORAGE ACCESS for consume [Cost: 13M, Rows: 1M] (PATH ID: 3)
Runtime Filter: (SIP1(HashJoin): upperb(consume.resource_uri))

[no use UPPERB]
+---> JOIN HASH [Semi] [Cost: 143M, Rows: 1M] (PATH ID: 2) Outer (RESEGMENT)(LOCAL ROUND ROBIN)
+-- Outer -> STORAGE ACCESS for consume [Cost: 13M, Rows: 1M] (PUSHED GROUPING) (PATH ID: 3)
Materialize: consume.ext_eu_price, consume.resource_uri, consume.ext_msrp, consume.ext_reseller_price

For 'use UPPERB', Runtime Filter appeared and it could reduce the number of rows by this filter when accessing the table. It might be able to reduce the disk I/O and could reduce the data before processing the join operator. The optimizer chose RESEGMENT for the join operator in case of no use UPPERB, but chose BROADCAST in case of use UPPERB. It didn't choose RESEGMENT because the function (UPPERB) was used as the join key. But, on the other hand, Sidewise Information Passing (SIP) is available when choosing BROADCAST, not available when choosing RESEGMENT.

As a result, in your case, BROADCAST + SIP was faster than RESEGMENT.

We're Moving!

Create My New Community Account Now

The explain for upper and non-Upperb are different and execution times vary widely

Answers

Leave a Comment