Filters with subqueries
I have a situation where my query plan is about 1000x slower because Vertica chooses a different plan with a subquery vs a constant.
E.g,
lastid AS
(
SELECT last_id as max(id) from processed_rows;
)
SELECT * From my_table m
inner join other_table x om m.a = x.a
where m.id > (select last_id from last_id )
That query does a bad plan and doesn't actually apply the filter until after the join (scanning a huge terabyte table)
lastid AS ( SELECT last_id as max(id) from processed_rows; )
SELECT * From my_table m
inner join other_table x om m.a = x.a
where m.id > 10000
This plan applies the id filter at join time (resulting in usually just a few rows to have to join)
I can't find anyway to trick the planner into applying the filter. I've tried lots of different hints and even using inner joins instead of WHERE clause filters.
If I can't find a solution I'll have to change a huge part of our ETL process to allow templating of these SQL scripts.
Comments
Are you trying to use a WITH clause? Look into "Materialization of WITH Clause":
https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/AnalyzingData/Queries/Subqueries/WithClauseMaterialization.htm
I've tried it with a materialization hint and it had the same results.
Next step could be to
CREATE LOCAL TEMPORARY TABLE foo ON COMMIT PRESERVE ROWS AS SELECT last_id AS max(id) FROM processed_rows;
- and see if that helps.If that still does not help, try creating your own constant for the where condition, by generating a SQL statement containing the MAX(id) value as a constant. Write the generated statement to a file (tuples only, unaligned, if you use vsql), then execute the just generated script.
The query to generate the script would be:
SELECT
'SELECT * '
||'FROM my_table m '
||'INNER JOIN other_table x ON m.a = x.a '
||'WHERE m.id > '
||MAX(id)::CHAR(18)
||';'
FROM processed_rows;
I have the same use case and looking for a solution as well. In certain cases it makes performance sense to materialize the subquery and use a filter rather than a join. I have started my own thread on it.
https://forum.vertica.com/discussion/239193/how-to-force-early-evaluation-of-non-correlated-subqueries#latest
Any solution for this problem?
I noticed that sometimes it does apply filter before joining.
But most of the times it does not, resulting in poor performance.
@pdhokte Could you please open a support case and share explain plan?