We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


ANALYZE_STATISTICS and ANALYZE_STATISTICS_PARTITION — Vertica Forum

ANALYZE_STATISTICS and ANALYZE_STATISTICS_PARTITION

dimitri_pdimitri_p
edited March 21 in General Discussion

I have a huge partitioned table that I've been ANALYZE_STATISTICS() for regularly. Recently, I decided to switch to hourly ANALYZE_STATISTICS_PARTITION for new partitions only.

In order to have a full statistics coverage, do I need to run ANALYZE_STATISTICS_PARTITION on every single existing partition once to sort of create a statistics baseline, before I switch to hourly ANALYZE_STATISTICS_PARTITION?

Is having statistics on each individual partition in a partitioned table equivalent to having run analyze_statistics() on the entire table?

Answers

  • Huge tables does not change statistics quickly. Analysing once a week over weekend should be more than enough.
    Vertica had problem - if you have right-handed insert into some column, for example each new record has current timestamp, value will go above column max value recorded during statistics collection. If user issue a query that selects recent data, optimiser will throw warning "value out of range" and will disable a bunch of internal optimisations.
    That was the only reason to run analyse partition for recent data often, like hourly.
    Still it does not help if user is selecting last minute data.
    Addressing this problem is not straightforward. I have python code that export statistics, tweak it to extend max value in some columns, and import back. You can find code on github.
    Vertica v 25.1 (if I remember correctly version no), finally announced that "optimiser value out of range" problem has been addressed, though there is no mention how it is done.
    Easiest way to address problem is to upgrade to v 25.1 and collect stats weekly, and forget about hourly statistics collection on recent partition.
    You will be able to verify how it works, by checking optimiser events.

  • interesting insights, thanks, Sergey, I appreciate your help!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file