Analyzing Table Statistics by Column
[Deleted User]
Administrator
This blog post was authored by Jim Knicely.
The ANALYZE_STATISTICS function collects and aggregates data samples and storage information from all nodes that store projections associated with the specified table. On a very large wide table, it takes a significant amount of time to gather those statistics.
In many situations only a few columns in the table are modified as part of an ETL/ELT process. In these cases instead of reanalyzing all of the table’s columns, you can analyze just the updated columns, reducing the ANALYZE_STATISTICS function’s run time!
Example:
=> SELECT COUNT(*) FROM big_varchar_table; COUNT ------------ 1000000000 (1 row) => \timing Timing is on. => select analyze_statistics('big_varchar_table', 100); -- Analyze entire table analyze_statistics -------------------- 0 (1 row) Time: First fetch (1 row): 196707.123 ms. All rows formatted: 196707.185 ms => UPDATE big_varchar_table SET the_varchar = 'DATA UPDATED...' WHERE pk BETWEEN 2098174 AND 4098174; OUTPUT --------- 2000001 (1 row) Time: First fetch (1 row): 1909.409 ms. All rows formatted: 1909.474 ms => select analyze_statistics('big_varchar_table', 'the_varchar', 100); -- Analyze only updated column analyze_statistics -------------------- 0 (1 row) Time: First fetch (1 row): 48348.982 ms. All rows formatted: 48349.028 ms
By analyzing only the column that was updated I reduced the run time of the ANALYZE_STATISTICS function from 3.3 minutes to 48 seconds!
1