Wrong query planner estimates for empty tables
Hi, looking into QUERY_EVENTS table I sometimes see these records: event_type | NO HISTOGRAM event_description | The optimizer encountered a predicate on a column for which it does not have a histogram | event_details | No histogram for db_zdq1m22nx6apb34mk6jdzal1wjv2et8n.qt_40b2c11df5d00551402b043460a51.def_5bbebebccbfc7ca2a0bfc06f40e56d5e suggested_action | analyze_statistics('db_zdq1m22nx6apb34mk6jdzal1wjv2et8n.qt_40b2c11df5d00551402b043460a51.def_5bbebebccbfc7ca2a0bfc06f40e56d5e'); The point is that qt_40b2c11df5d00551402b043460a51 table in question has been in fact analyzed but it is empty. But query planner cannot see any statistics about it and so assumes that the table has 10k rows. This could lead to inefficient execution plan or to quite a large overestimation of memory usage for the query, possibly postponing it due to currently insufficient resources (when the system is under heavy load). Any ideas how big problem this could (when using e.g. 3 or more such empty tables in query), and especially what to do with it? Note that in our application built on Vertica we use several CREATE TABLE AS SELECT statements depending on each other so we cannot tell in advance whether those intermediate tables will be empty or not.
0
Comments
Having a 10K row estimate instead of the actual 0 rows isn't likely to cause an issue. 10K is very close to 0, so you're not likely to get a suboptimal query plan because of that minor difference. The opposite can be true - a 10K row estimate when the real row count is millions or billions can lead to poor query plan choices by the optimizer.
Memory is allocated for queries based on the query plan's operators, not based on the estimated row counts.
--Sharon