Updating table statistics?
This discussion has been closed.
The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.
Join us there to post discussion topics, learn about
product releases, share tips, access the blog, and much more.
Answers
You can have Vertica build ANALYZE_STATISTICS commands to analyze stats on projections that do not have stats... Then have crontab run them.
Simple Example:
So your crontab entry might run once a day like this:
00 00 * * * /home/dbadmin/scripts/analyze_stats.sh > /home/dbadmin/scripts/analyze_stats.out 2>&1
Where the analyze_stats.sh script looks like this:
@Jim_Knicely How frequently will the statistics become stale? I ran the select analyze_statistics() for once manually about 3 days ago, and i have more that 750M records loaded into db everyday but as of today i can still see 'statistics_type' column in 'projection_columns' table is still having value of 'Full'
Also read somewhere that running analyze_statistics() unnecessarily on the table increases catalog size, is that true?
The info 'FULL' for the
statistics_type
means just one of the possible values: 'FULL','ROWCOUNT','NONE'. That also means that 'FULL' statistics can very well be stale.Imagine you have a
TIMESTAMP
column namedcreation_ts
,DEFAULT CURRENT_TIMESTAMP
. If you ranANALYZE_STATISTICS()
on that column yesterday and today you inserted more rows - the statistics on that column will be full, but still, every query asking for today's data , say,WHERE creation_ts >= TRUNC(CURRENT_TIMESTAMP,'DD') AND creation_ts < TRUNC(CURRENT_TIMESTAMP,'DD') + 1
will search for data for which no statistics have yet been gathered. For that type of query, the table/column's statistics would be too stale to allow for an efficient query plan, no matter what thestatistics_type
for that column may be.