Options

Resegment occurs everytime

Hi,

i have 2 tables and has very less data ....but when i am doing the Explain plan on joining two tables it is saying the Resegment(outer) and Resegment(Inner).Both are using the super projection and joining also on same daat typ column.

 

QUERY PLAN DESCRIPTION:
------------------------------

explain with a as ( select a,b from "263723_egbw_rvcpoc".a2 where a=2 ),b as ( select a,b from "263723_egbw_rvcpoc".b2) select a1.a,a1.b,b1.a,b1.b from a a1 left join b b1 on a1.a=b1.a;

Access Path:
+-JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 59, Rows: 11] (PATH ID: 1) Outer (RESEGMENT) Inner (RESEGMENT)
| Join Cond: (a1.a = b1.a)
| Execute on: All Nodes
| +-- Outer -> SELECT [Cost: 24, Rows: 1] (PATH ID: 2)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for a2 [Cost: 24, Rows: 1] (PATH ID: 3)
| | | Projection: 263723_egbw_rvcpoc.a2_b0
| | | Materialize: a2.a, a2.b
| | | Filter: (a2.a = 2)
| | | Execute on: All Nodes
| +-- Inner -> SELECT [Cost: 33, Rows: 11] (PATH ID: 4)
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for b2 [Cost: 33, Rows: 11] (PATH ID: 5)
| | | Projection: 263723_egbw_rvcpoc.b2_b0
| | | Materialize: b2.a, b2.b
| | | Execute on: All Nodes

 

 

Projections used

263723_egbw_rvcpoc.b2_b0

 

CREATE PROJECTION "263723_egbw_rvcpoc".b2_b0 /*+basename(b2),createtype(L)*/
(
a,
b
)
AS
SELECT b2.a,
b2.b
FROM "263723_egbw_rvcpoc".b2
ORDER BY b2.a,
b2.b
SEGMENTED BY hash(b2.a, b2.b) ALL NODES OFFSET 0;

 

 

CREATE PROJECTION "263723_egbw_rvcpoc".a2_b0 /*+basename(a2),createtype(L)*/
(
a,
b
)
AS
SELECT a2.a,
a2.b
FROM "263723_egbw_rvcpoc".a2
ORDER BY a2.a,
a2.b
SEGMENTED BY hash(a2.a, a2.b) ALL NODES OFFSET 0;

 

Just wanted to know why Outer (RESEGMENT) Inner (RESEGMENT) both occurs even if projection columns are sorted.

 

 

 

Comments

  • Options

    Hi ,

    Vertica build  execution plan with RESEGMENT   plan , as a results of the fact that  your query is  joining  your data only based on the first column  of the  “segmented by” , assumes you have 1K records for a=100 , the value of 100 is distributed among your cluster , only 100,<second value >  is located on specific node (only in this case Vertica will avoid RESEGMENT   execution plan)

    To avoid RESEGMENT you have two options :

    1. Used all the columns that exists on your segmented by in the join -> LEFT JOIN b b1 ON a1.a=b1.a and a1.b=b1.b;
    2. Change your production to  “segmented by” only by  a column

     

    explain WITH a AS
    (SELECT a,
    b
    FROM a2
    WHERE a=2),
    b AS
    (SELECT a,
    b
    FROM b2)
    SELECT a1.a,
    a1.b,
    b1.a,
    b1.b
    FROM a a1
    LEFT JOIN b b1 ON a1.a=b1.a and a1.b=b1.b;

    Access Path:
    +-JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 3, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
    | Join Cond: (a1.a = b1.a) AND (a1.b = b1.b)
    | Execute on: All Nodes
    | +-- Outer -> SELECT [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 2)
    | | Execute on: All Nodes
    | | +---> STORAGE ACCESS for a2 [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 3)
    | | | Projection: public.a2_b0_b0
    | | | Materialize: a2.a, a2.b
    | | | Filter: (a2.a = 2)
    | | | Execute on: All Nodes
    | +-- Inner -> SELECT [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 4)
    | | Execute on: All Nodes
    | | +---> STORAGE ACCESS for b2 [Cost: 1, Rows: 1 (NO STATISTICS)] (PATH ID: 5)
    | | | Projection: public.b2_b0_b0
    | | | Materialize: b2.a, b2.b
    | | | Execute on: All Nodes

     

     

    I hope you will find it useful

    Thanks

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file