The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Analyzing Table Statistics by Column

This blog post was authored by Jim Knicely.

The ANALYZE_STATISTICS function collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table. On a very large wide table, it takes a significant amount of time to gather those statistics.

In many situations only a few columns in the table are modified as part of an ETL/ELT process. In these cases instead of reanalyzing all of the table’s columns, you can analyze just the updated columns, reducing the ANALYZE_STATISTICS function’s run time!


=> SELECT COUNT(*) FROM big_varchar_table;
(1 row)

=> \timing
Timing is on.

=> select analyze_statistics('big_varchar_table', 100); -- Analyze entire table
(1 row)

Time: First fetch (1 row): 196707.123 ms. All rows formatted: 196707.185 ms

=> UPDATE big_varchar_table SET the_varchar = 'DATA UPDATED...' WHERE pk BETWEEN 2098174 AND 4098174;
(1 row)

Time: First fetch (1 row): 1909.409 ms. All rows formatted: 1909.474 ms

=> select analyze_statistics('big_varchar_table', 'the_varchar', 100); -- Analyze only updated column
(1 row)

Time: First fetch (1 row): 48348.982 ms. All rows formatted: 48349.028 ms

By analyzing only the column that was updated I reduced the run time of the ANALYZE_STATISTICS function from 3.3 minutes to 48 seconds!

Sign In or Register to comment.