Options

union all and SIP

i'm probably missing something. but in the following query, i don't understand why SIP isn't pushed-down past union all into the two scans. SIP is pl at the union operator which causes a full table scan on foo and bar.

create table foo (x int not null, primary key(x));
insert into foo values(1);
create table bar (x int not null, primary key(x));
insert into bar values(3);
create table ace (x int not null, y int not null, primary key(x));
insert into ace values(5,5);

explain select f.x        from (
                select * from foo
                union all
                select * from bar
        ) f, ace a
        where f.x = a.x
        and a.y = 1;

Access Path: +-JOIN HASH [Cost: 8, Rows: 2 (NO STATISTICS)] (PATH ID: 1) Outer (LOCAL ROUND ROBIN)
 |  Join Cond: (f.x = a.x)
 |  Execute on: All Nodes
 | +-- Outer -> SELECT [Cost: 4, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
 | |      Execute on: All Nodes
 | | +---> UNION ALL [Cost: 4, Rows: 4 (NO STATISTICS)] (PATH ID: 3)
 | | |      Execute on: All Nodes
 | | |      Runtime Filter: (SIP1(HashJoin): f.x)
 | | |      Execute on: All Nodes
 | | | +---> STORAGE ACCESS for foo [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 5)
 | | | |      Projection: public.foo_b0
 | | | |      Materialize: foo.x
 | | | |      Execute on: All Nodes
 | | | +---> STORAGE ACCESS for bar [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 7)
 | | | |      Projection: public.bar_b0
 | | | |      Materialize: bar.x
 | | | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for a [Cost: 3, Rows: 1 (NO STATISTICS)] (PATH ID: 8)
 | |      Projection: public.ace_b0
 | |      Materialize: a.x
 | |      Filter: (a.y = 1)
 | |      Execute on: All Nodes





Comments

  • Options
    Hi

    You need to gather statistics on your table , please do that and update if it help .

  • Options
    updated stats don't help. note that in my actual application (from which this test case was derived), the tables are much larger and the statistics are updated.
  • Options
    Thanks for having sample DDL. The way I'm reading your plan, the data is gathered in path 5 & 7, which is then passed into path 3. At this point, the predicate is pushed down.

    Before starting, lets bring the query to ANSI-92 standards:
    SELECT f.x  FROM   (SELECT * FROM foo          UNION ALL          SELECT * FROM bar) f         JOIN ace a           ON f.x = a.x  WHERE  a.y = 1; 
    For the predicate pushdown to occur sooner, the join has to occur before the union:
    SELECT a.x  FROM   foo a         JOIN ace b           ON a.x = b.x  WHERE  b.y = 1  UNION ALL  SELECT a.x  FROM   bar a         JOIN ace b           ON a.x = b.x  WHERE  b.y = 1 
    This produces:
     Access Path:  +-UNION ALL [Cost: 14, Rows: 2 (NO STATISTICS)]  |  Execute on: All Nodes  |  Execute on: All Nodes  | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 7, Rows: 1 (NO STATISTICS)] (PATH ID: 1)  | |      Join Cond: (a.x = b.x)  | |      Execute on: All Nodes  | | +-- Outer -> STORAGE ACCESS for a [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 2)  | | |      Projection: public.foo_b0  | | |      Materialize: a.x  | | |      Execute on: All Nodes  | | |      Runtime Filter: (SIP1(MergeJoin): a.x)  | | +-- Inner -> STORAGE ACCESS for b [Cost: 4, Rows: 1 (NO STATISTICS)] (PATH ID: 3)  | | |      Projection: public.ace_b0  | | |      Materialize: b.x  | | |      Filter: (b.y = 1)  | | |      Execute on: All Nodes  | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 7, Rows: 1 (NO STATISTICS)] (PATH ID: 1)  | |      Join Cond: (a.x = b.x)  | |      Execute on: All Nodes  | | +-- Outer -> STORAGE ACCESS for a [Cost: 2, Rows: 1 (NO STATISTICS)] (PATH ID: 2)  | | |      Projection: public.bar_b0  | | |      Materialize: a.x  | | |      Execute on: All Nodes  | | |      Runtime Filter: (SIP2(MergeJoin): a.x)  | | +-- Inner -> STORAGE ACCESS for b [Cost: 4, Rows: 1 (NO STATISTICS)] (PATH ID: 3)  | | |      Projection: public.ace_b0  | | |      Materialize: b.x  | | |      Filter: (b.y = 1)  | | |      Execute on: All Nodes
    The above plan has better parallelism and is more favorable as it doesn't need to materialize b.y like below.
    SELECT *  FROM   (SELECT a.x,                 b.y          FROM   foo a                 JOIN ace b                   ON a.x = b.x          UNION ALL          SELECT a.x,                 b.y          FROM   bar a                 JOIN ace b                   ON a.x = b.x) c  WHERE  c.y = 1 
  • Options
    Hi 
    I am not sure , SIP is being applay to the Union All results .  

    kevinwilkinson , did you have the chance to talk to support ?
     
  • Options
    I don't understand your comment.
  • Options
    Reading your comment   first time give me the feeling that you think that SIP is taking place into the view ,  but reading it again you agree that its taking place on top of the view results .

    Sorry about miss understanding your comment .

     

    Anyway  , this ability is mandatory to have , the workaround you suggest is more costly   that real SIP push into View 

  • Options
    eli: i have not contacted support. i was just assuming they monitor the forums.  :)
    norbert: thanks for your suggestion. i think my query is functionally equivalent to your rewrites. but, that goes to the heart of my question. why should i have to manually rewrite? what is it about "union all" that suppresses SIP pushdown. perhaps another way to see it is, run explain on
         select f.x, f.y from foo f, ace a
         where f.x = a.x and f.y = a.y and a.z = 1;
    you'll see SIP pushdown into the scan of table foo. but, when i add the union all over tables foo and bar, there's no pushdown. i don't understand why. 
  • Options
    Kevin,

    I'm not well versed with Vertica's materialization strategies. However, I can tell you that logically it makes sense that the inner UNION ALL is evaluated first. This is because that result set may not necessarily be used in a later join, or there may be other predicates involved in the UNION ALL which are not eligible for SIP.

    There's an excellent paper on the technique in IEEE named Materialization Strategies in the Vertica Analytic Database: Lessons Learned. If you're on the HP network, I can help you find it.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file