Options

Export/import statistics on flex table is not working

Sergey_Cherepan_1Sergey_Cherepan_1 ✭✭✭
edited October 2023 in General Discussion

Hi,

You can analyse statistics on flex table, but export_statistics on flex table return empty schema

select export_statistics('', 'myschema.mytable');

<?xml version="1.0" encoding="utf-8"?>
<schema>
<tables>
</tables>
</schema>

That is despite of several materialised column in flex table.

One more thing - materialized columns in flex table can have dots in column name, for nested fields. Export_statistics reject attempt to export statistics for column with dot in name

select export_statistics('', 'myschema.mytable.c1.f1.d1');

[Code: 2065, SQL State: 42501] [Vertica]VJDBC ERROR: export_statistics: Invalid table/projection/column myschema.mytable.c1.f1.d1

Looks like export/import statistics on flex table is not properly supported. Can you file a request to engineering to start supporting it.

Thank you
Sergey

Answers

  • Options
    SruthiASruthiA Vertica Employee Administrator

    we already have a feature request open with engineering for it. VER-46737 is the JIRA number.

  • Options
    SruthiASruthiA Vertica Employee Administrator

    Could you please share more details on your use case and why you need it?

  • Options

    That is to address a very annoying problem - PREDICATE VALUE OUT OF RANGE
    It is caused when new data with values out of min/max range has been inserted into table after collecting statistics.
    It causes optimiser to severely underestimate row count and produce very suboptimal query plan.
    Vertica recommendation is to collect statistics on table or column after inserting new data, or very often.
    This recommendation does not work for me, with more than 10 tables with audited size over 1PB. At least, there is no information when new data with value exceeding range has been inserted. Running stats collection on petabyte tables is quite resource consuming, I am doing it only once over weekend. Periodically collecting statistics on single column still misses queries that run after new data with value outside range has been inserted, and statistics collection on column.

    My solution to problem is to detect columns that are subjected to PREDICATE VALUE OUT OF RANGE, export statistics on them, interpolate stats for next week, and import tweaked statistics back.

    I am exporting statistics on column, find if it is close to current day (i.e. within 7 days of current date) and extend statistics, by adding bucket with interpolated value, and put tweaked statistics back.

    That nicely addressed problem for whole database, I do not see PREDICATE VALUE OUT OF RANGE at all. Statistics interpolation into near future works just fine, optimiser is able to produce as optimal query plan as it can. No overhead of running frequent stats on huge tables, database stats collected only once a week over weekend.

    I am quite happy with this solution. Though I found it does not work on flex tables, hence this request.

    You can see python code on github, https://github.com/scherepanov/verticaudxutilities/blob/main/scripts/python/VerticaStatTweaker.py

    Anecdotal fact - couple of years ago I listen to all presentations for Vertica Academy session, and 3 presenters from Vertica consultants were making living on this problem - going to clients, finding and fixing it.

  • Options
    SruthiASruthiA Vertica Employee Administrator

    thank you for sharing the detailed usecase.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file