We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Analyze_histogram and statistics in Vertica 6.1 — Vertica Forum

Analyze_histogram and statistics in Vertica 6.1

Hi there, I just had some questions on how vertica generates statistics. I was under the impression Vertica automatically does not do this, and I would need to run analyze_histogram manually after a big load/partition drop, etc. But when I was looking at the system table v_monitor.projection_columns, I see the column "statistics_updated_timestamp" being updated. And I don't think it's any of the processes we have in-place but Vertica itself doing it. Am I wrong on this? Is Vertica updating statistics after a load or something? Or I misinterpreting the information somehow (or confusing myself). Also - I had a question on the usage of ANALYZE_HISTOGRAM if I do SELECT ANALYZE_HISTOGRAM('table_name',100) -- will it analyze statistics on each individual column by reading 100% of the data (or whatever row limit there is). What is the performance difference between analyzing histogram on table level, instead of of running analyze_histogram on each column in a table? Hope to hear from somebody! Thanks in advance.

Comments

  • Hi Sam, Those are some good questions. I would recommend that you take a look through the "Collecting Database Statistics" section of the Administrator's Guide: https://my.vertica.com/docs/6.1.x/PDF/HP_Vertica_6.1.x_AdminGuide.pdf It will, I believe, address several of them. In particular, regarding your first question, note the subsection on ANALYZE ROW COUNT -- there are more types of statistics than just histograms being gathered, and some of the cheaper statistics are gathered automatically. Regarding ANALYZE_HISTOGRAM(..., 100), yes, that will read 100% of the data. It's therefore rather expensive; not usually recommended. Running analyze_histogram on the whole table is just a shortcut for running it on each column; there should be no meaningful performance difference. Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file