How to deal with right-hand inserts
This is a follow-up question for optimizer team presentation on BDC.
Right-hand inserts happens when table is continuously being populated with ever increasing values in some columns.
For example, table partitioned by date, and each date you adding new day.
Or, column have a time, and you are continuously loading data and value is current time.
Problem is that right-hand insert never have a fixed upper limit on value. For time columns, if you collect statistics that detected min/max value for each column, for time columns result is outdated next minute as new data arrived.
My users querying today's data most. Optimizer detects that date is "value out of range", and predict only few rows will be selected from table. And it is for table that gets loaded almost 100 bln rows per day.
As result, optimizer decided to use broadcast huge table and use it as inner for join. That would be my choice for worst possible execution plan.
I am routinely finding bad queries where optimizer have warning "value out of range" and broadcasting huge table.
Usually it takes me 30 minutes to add hints and force right execution plan. In most cases SQL performance improving around 2 orders of magnitude after manual tuning. Key steps in SQL tuning are preventing broadcasts of huge tables and preventing huge tables to be used as inner in join.
I tried to raise issue with Vertica support, their recommendation was to collect statistics more often. That is a very poor suggestion, as collecting statistics on several petabytes of data is very resource consuming and lasts several hours. Also, tables with size over petabytes are practically not changing, collecting statistics on them again and again are useless resource waste. It would be enough to collect statistics once a year on humongous tables.
Question to optimizer team - any plans to fix this issue, or provide any workaround or config parameter etc.
Issue summary: tables with right-hand inserts never have a valid min/max values on time/date columns, causing optimizer to severely underestimate row count for huge tables and produce very inefficient execution plan.