We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


A case of disappearing statistics — Vertica Forum

A case of disappearing statistics

dimitri_pdimitri_p
edited May 2 in General Discussion

We regularly run analyze_statistics on one of our tables, and I noticed that the statistics vanishes just as regularly.

SELECT distinct statistics_type FROM projection_columns WHERE table_name = 'our_table';

changes to FULL after running

select analyze_statistics('our_table');

but after some time it is back to very backdated ROWCOUNT statistics.

The only way I could think of for this to happen is if someone or something was running 'drop_statistics' on the table. But no such thing is happening per QUERY_REQUESTS. Plus, once you run drop_statistics, the current statistics type resets to NONE and not to some ancient ROWCOUNT.

Any idea how this is possible?

NB: that is on old Vertica 9.2

Answers

  • dimitri_pdimitri_p
    edited May 2

    It looks like our_table has multiple projections with one - having stubborn ancient ROWCOUNT statistics (from 2023) that are somehow immune to analyze_statistics. I am not sure how it is possible. I will be trying to use drop_statistics('our_table') hoping that it will wipe out the stubborn old statistics that has somehow stuck and is probably causing the issue of the statistics disappearance.

  • Well known problem with Vertica. After table undergo a lot of data loads or deletes, it decides that it had too much changes and discard all collected statistics. Sure sign of it - ROWCOUNT statistics type on table.
    While that can be a right approach for smallish tables (under 100TB), it is extremely disrupting for petabyte sized tables. Statistics get lost after couple hours of normal business operations that involves very intensive data loading. Collecting statistics on petabyte tables is expensive, and mostly useless because it will be lost when it will be most needed.
    I am not aware how to change this behaviour. This is something new introduced around couple of years ago.
    For workaround seems to be best approach is to export statistics into XML after collecting, while it is good. Then periodically, like once in an hour, restore statistics from XML.
    That will keep statistics in place, even you are loading a lot of data.
    Next, there is a problem of PREDICATE VALUE OUT OF RANGE. For columns that undergo right-hand inserts, like timestamp column that has current time of insert, you cannot have good statistics in principle. I am tweaking statistics XML and extending stats to cover next week for columns with right-hand inserts.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file