union all and SIP
i'm probably missing something. but in the following query, i don't understand why SIP isn't pushed-down past union all into the two scans. SIP is pl at the union operator which causes a full table scan on foo and bar.
create table foo (x int not null, primary key(x));
insert into foo values(1);
create table bar (x int not null, primary key(x));
insert into bar values(3);
create table ace (x int not null, y int not null, primary key(x));
insert into ace values(5,5);
explain select f.x from (
select * from foo
union all
select * from bar
) f, ace a
where f.x = a.x
and a.y = 1;
Access Path: +-JOIN HASH [Cost: 8, Rows: 2 (NO STATISTICS)] (PATH ID: 1) Outer (LOCAL ROUND ROBIN)
| Join Cond: (f.x = a.x)
| Execute on: All Nodes
| +-- Outer -> SELECT [Cost: 4, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
| | Execute on: All Nodes
| | +---> UNION ALL [Cost: 4, Rows: 4 (NO STATISTICS)] (PATH ID: 3)
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): f.x)
| | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for foo [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 5)
| | | | Projection: public.foo_b0
| | | | Materialize: foo.x
| | | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for bar [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 7)
| | | | Projection: public.bar_b0
| | | | Materialize: bar.x
| | | | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for a [Cost: 3, Rows: 1 (NO STATISTICS)] (PATH ID: 8)
| | Projection: public.ace_b0
| | Materialize: a.x
| | Filter: (a.y = 1)
| | Execute on: All Nodes
create table foo (x int not null, primary key(x));
insert into foo values(1);
create table bar (x int not null, primary key(x));
insert into bar values(3);
create table ace (x int not null, y int not null, primary key(x));
insert into ace values(5,5);
explain select f.x from (
select * from foo
union all
select * from bar
) f, ace a
where f.x = a.x
and a.y = 1;
Access Path: +-JOIN HASH [Cost: 8, Rows: 2 (NO STATISTICS)] (PATH ID: 1) Outer (LOCAL ROUND ROBIN)
| Join Cond: (f.x = a.x)
| Execute on: All Nodes
| +-- Outer -> SELECT [Cost: 4, Rows: 4 (NO STATISTICS)] (PATH ID: 2)
| | Execute on: All Nodes
| | +---> UNION ALL [Cost: 4, Rows: 4 (NO STATISTICS)] (PATH ID: 3)
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): f.x)
| | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for foo [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 5)
| | | | Projection: public.foo_b0
| | | | Materialize: foo.x
| | | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for bar [Cost: 2, Rows: 2 (NO STATISTICS)] (PATH ID: 7)
| | | | Projection: public.bar_b0
| | | | Materialize: bar.x
| | | | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for a [Cost: 3, Rows: 1 (NO STATISTICS)] (PATH ID: 8)
| | Projection: public.ace_b0
| | Materialize: a.x
| | Filter: (a.y = 1)
| | Execute on: All Nodes
0
Comments
You need to gather statistics on your table , please do that and update if it help .
Before starting, lets bring the query to ANSI-92 standards: For the predicate pushdown to occur sooner, the join has to occur before the union: This produces: The above plan has better parallelism and is more favorable as it doesn't need to materialize
b.y
like below.I am not sure , SIP is being applay to the Union All results .
kevinwilkinson , did you have the chance to talk to support ?
Sorry about miss understanding your comment .
Anyway , this ability is mandatory to have , the workaround you suggest is more costly that real SIP push into View
norbert: thanks for your suggestion. i think my query is functionally equivalent to your rewrites. but, that goes to the heart of my question. why should i have to manually rewrite? what is it about "union all" that suppresses SIP pushdown. perhaps another way to see it is, run explain on
select f.x, f.y from foo f, ace a
where f.x = a.x and f.y = a.y and a.z = 1;
you'll see SIP pushdown into the scan of table foo. but, when i add the union all over tables foo and bar, there's no pushdown. i don't understand why.
I'm not well versed with Vertica's materialization strategies. However, I can tell you that logically it makes sense that the inner
UNION ALL
is evaluated first. This is because that result set may not necessarily be used in a later join, or there may be other predicates involved in theUNION ALL
which are not eligible for SIP.There's an excellent paper on the technique in IEEE named Materialization Strategies in the Vertica Analytic Database: Lessons Learned. If you're on the HP network, I can help you find it.