Sampling in ANALYZE_STATISTICS

andriiandrii Vertica Customer

Hello,
I have a question about the documentation "Collecting table statistics". According to the latest table (at the bottom) Vertica does two levels of sampling for "Number of column rows" = 4000K. The first time Vertica takes 400K, and the second time - 131,072 rows. What is the reason behind this approach? Why not just use 131,072 rows?

Answers

  • Bryan_HBryan_H Vertica Employee Administrator

    Digging further into the ANALYZE_STATISTICS function page, the first read of 400K rows (10%) is to speed up analysis by reading less data from disk. Reading more rows then sampling would produce higher accuracy as the tradeoff of longer read from disk to scan all 4000K rows and select 131,072.

  • andriiandrii Vertica Customer
    edited August 3

    Thanks

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file