Analyze_histogram and statistics in Vertica 6.1

Hi there, I just had some questions on how vertica generates statistics. I was under the impression Vertica automatically does not do this, and I would need to run analyze_histogram manually after a big load/partition drop, etc. But when I was looking at the system table v_monitor.projection_columns, I see the column "statistics_updated_timestamp" being updated. And I don't think it's any of the processes we have in-place but Vertica itself doing it. Am I wrong on this? Is Vertica updating statistics after a load or something? Or I misinterpreting the information somehow (or confusing myself). Also - I had a question on the usage of ANALYZE_HISTOGRAM if I do SELECT ANALYZE_HISTOGRAM('table_name',100) -- will it analyze statistics on each individual column by reading 100% of the data (or whatever row limit there is). What is the performance difference between analyzing histogram on table level, instead of of running analyze_histogram on each column in a table? Hope to hear from somebody! Thanks in advance.


  • Options
    Hi Sam, Those are some good questions. I would recommend that you take a look through the "Collecting Database Statistics" section of the Administrator's Guide: https://my.vertica.com/docs/6.1.x/PDF/HP_Vertica_6.1.x_AdminGuide.pdf It will, I believe, address several of them. In particular, regarding your first question, note the subsection on ANALYZE ROW COUNT -- there are more types of statistics than just histograms being gathered, and some of the cheaper statistics are gathered automatically. Regarding ANALYZE_HISTOGRAM(..., 100), yes, that will read 100% of the data. It's therefore rather expensive; not usually recommended. Running analyze_histogram on the whole table is just a shortcut for running it on each column; there should be no meaningful performance difference. Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file