Resegment occurs everytime
Hi,
i have 2 tables and has very less data ....but when i am doing the Explain plan on joining two tables it is saying the Resegment(outer) and Resegment(Inner).Both are using the super projection and joining also on same daat typ column.
QUERY PLAN DESCRIPTION:
------------------------------
explain with a as ( select a,b from "263723_egbw_rvcpoc".a2 where a=2 ),b as ( select a,b from "263723_egbw_rvcpoc".b2) select a1.a,a1.b,b1.a,b1.b from a a1 left join b b1 on a1.a=b1.a;
Access Path:
+-JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 59, Rows: 11] (PATH ID: 1) Outer (RESEGMENT) Inner (RESEGMENT)
| Join Cond: (a1.a = b1.a)
| Execute on: All Nodes
| +-- Outer -> SELECT [Cost: 24, Rows: 1] (PATH ID: 2)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for a2 [Cost: 24, Rows: 1] (PATH ID: 3)
| | | Projection: 263723_egbw_rvcpoc.a2_b0
| | | Materialize: a2.a, a2.b
| | | Filter: (a2.a = 2)
| | | Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 33, Rows: 11] (PATH ID: 4)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for b2 [Cost: 33, Rows: 11] (PATH ID: 5)
| | | Projection: 263723_egbw_rvcpoc.b2_b0
| | | Materialize: b2.a, b2.b
| | | Execute on: All Nodes
Projections used
263723_egbw_rvcpoc.b2_b0
CREATE PROJECTION "263723_egbw_rvcpoc".b2_b0 /*+basename(b2),createtype(L)*/
(
a,
b
)
AS
SELECT b2.a,
b2.b
FROM "263723_egbw_rvcpoc".b2
ORDER BY b2.a,
b2.b
SEGMENTED BY hash(b2.a, b2.b) ALL NODES OFFSET 0;
CREATE PROJECTION "263723_egbw_rvcpoc".a2_b0 /*+basename(a2),createtype(L)*/
(
a,
b
)
AS
SELECT a2.a,
a2.b
FROM "263723_egbw_rvcpoc".a2
ORDER BY a2.a,
a2.b
SEGMENTED BY hash(a2.a, a2.b) ALL NODES OFFSET 0;
Just wanted to know why Outer (RESEGMENT) Inner (RESEGMENT) both occurs even if projection columns are sorted.
Comments
Hi ,
Vertica build execution plan with RESEGMENT plan , as a results of the fact that your query is joining your data only based on the first column of the “segmented by” , assumes you have 1K records for a=100 , the value of 100 is distributed among your cluster , only 100,<second value > is located on specific node (only in this case Vertica will avoid RESEGMENT execution plan)
To avoid RESEGMENT you have two options :
explain WITH a AS
(SELECT a,
b
FROM a2
WHERE a=2),
b AS
(SELECT a,
b
FROM b2)
SELECT a1.a,
a1.b,
b1.a,
b1.b
FROM a a1
LEFT JOIN b b1 ON a1.a=b1.a and a1.b=b1.b;
Access Path:
+-JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 3, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
| Join Cond: (a1.a = b1.a) AND (a1.b = b1.b)
| Execute on: All Nodes
| +-- Outer -> SELECT [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for a2 [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 3)
| | | Projection: public.a2_b0_b0
| | | Materialize: a2.a, a2.b
| | | Filter: (a2.a = 2)
| | | Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 4)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for b2 [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 5)
| | | Projection: public.b2_b0_b0
| | | Materialize: b2.a, b2.b
| | | Execute on: All Nodes
I hope you will find it useful
Thanks