We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Updating table statistics? — Vertica Forum

Updating table statistics?

rjs_docsrjs_docs Vertica Employee Employee
edited April 2020 in General Discussion

Answers

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    You can have Vertica build ANALYZE_STATISTICS commands to analyze stats on projections that do not have stats... Then have crontab run them.

    Simple Example:

    dbadmin=> SELECT DISTINCT 'SELECT analyze_statistics(''' || p.projection_schema || '.' || p.projection_basename || ''');' run_these FROM projections p JOIN projection_storage ps USING (projection_id) WHERE NOT p.has_statistics AND NOT p.is_aggregate_projection AND ps.row_count > 0;
                       run_these
    ------------------------------------------------
     SELECT analyze_statistics('public.some_fact');
     SELECT analyze_statistics('public.no_stats');
    (2 rows)
    
    dbadmin=> \! vsql -Atc "SELECT DISTINCT 'SELECT analyze_statistics(''' || p.projection_schema || '.' || p.projection_basename || ''');' run_these FROM projections p JOIN projection_storage ps USING (projection_id) WHERE NOT p.has_statistics AND NOT p.is_aggregate_projection AND ps.row_count > 0;" | vsql -a
    SELECT analyze_statistics('public.some_fact');
     analyze_statistics
    --------------------
                      0
    (1 row)
    
    SELECT analyze_statistics('public.no_stats');
     analyze_statistics
    --------------------
                      0
    (1 row)
    
    dbadmin=> SELECT DISTINCT 'SELECT analyze_statistics(''' || p.projection_schema || '.' || p.projection_basename || ''');' run_these FROM projections p JOIN projection_storage ps USING (projection_id) WHERE NOT p.has_statistics AND NOT p.is_aggregate_projection AND ps.row_count > 0;
     run_these
    -----------
    (0 rows)
    

    So your crontab entry might run once a day like this:

    00 00 * * * /home/dbadmin/scripts/analyze_stats.sh > /home/dbadmin/scripts/analyze_stats.out 2>&1

    Where the analyze_stats.sh script looks like this:

    dbadmin=> \! cat /home/dbadmin/scripts/analyze_stats.sh
    vsql -Atc "SELECT DISTINCT 'SELECT analyze_statistics(''' || p.projection_schema || '.' || p.projection_basename || ''');' run_these FROM projections p JOIN projection_storage ps USING (projection_id) WHERE NOT p.has_statistics AND NOT p.is_aggregate_projection AND ps.row_count > 1;" | vsql -a
    
  • @Jim_Knicely How frequently will the statistics become stale? I ran the select analyze_statistics() for once manually about 3 days ago, and i have more that 750M records loaded into db everyday but as of today i can still see 'statistics_type' column in 'projection_columns' table is still having value of 'Full'

    Also read somewhere that running analyze_statistics() unnecessarily on the table increases catalog size, is that true?

  • marcothesanemarcothesane - Select Field - Administrator

    The info 'FULL' for the statistics_type means just one of the possible values: 'FULL','ROWCOUNT','NONE'. That also means that 'FULL' statistics can very well be stale.

    Imagine you have a TIMESTAMP column named creation_ts, DEFAULT CURRENT_TIMESTAMP. If you ran ANALYZE_STATISTICS() on that column yesterday and today you inserted more rows - the statistics on that column will be full, but still, every query asking for today's data , say, WHERE creation_ts >= TRUNC(CURRENT_TIMESTAMP,'DD') AND creation_ts < TRUNC(CURRENT_TIMESTAMP,'DD') + 1 will search for data for which no statistics have yet been gathered. For that type of query, the table/column's statistics would be too stale to allow for an efficient query plan, no matter what the statistics_type for that column may be.

This discussion has been closed.