Updating table statistics?

rjs_docsrjs_docs Vertica Employee Employee
edited April 2020 in General Discussion

Answers

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    You can have Vertica build ANALYZE_STATISTICS commands to analyze stats on projections that do not have stats... Then have crontab run them.

    Simple Example:

    dbadmin=> SELECT DISTINCT 'SELECT analyze_statistics(''' || p.projection_schema || '.' || p.projection_basename || ''');' run_these FROM projections p JOIN projection_storage ps USING (projection_id) WHERE NOT p.has_statistics AND NOT p.is_aggregate_projection AND ps.row_count > 0;
                       run_these
    ------------------------------------------------
     SELECT analyze_statistics('public.some_fact');
     SELECT analyze_statistics('public.no_stats');
    (2 rows)
    
    dbadmin=> \! vsql -Atc "SELECT DISTINCT 'SELECT analyze_statistics(''' || p.projection_schema || '.' || p.projection_basename || ''');' run_these FROM projections p JOIN projection_storage ps USING (projection_id) WHERE NOT p.has_statistics AND NOT p.is_aggregate_projection AND ps.row_count > 0;" | vsql -a
    SELECT analyze_statistics('public.some_fact');
     analyze_statistics
    --------------------
                      0
    (1 row)
    
    SELECT analyze_statistics('public.no_stats');
     analyze_statistics
    --------------------
                      0
    (1 row)
    
    dbadmin=> SELECT DISTINCT 'SELECT analyze_statistics(''' || p.projection_schema || '.' || p.projection_basename || ''');' run_these FROM projections p JOIN projection_storage ps USING (projection_id) WHERE NOT p.has_statistics AND NOT p.is_aggregate_projection AND ps.row_count > 0;
     run_these
    -----------
    (0 rows)
    

    So your crontab entry might run once a day like this:

    00 00 * * * /home/dbadmin/scripts/analyze_stats.sh > /home/dbadmin/scripts/analyze_stats.out 2>&1

    Where the analyze_stats.sh script looks like this:

    dbadmin=> \! cat /home/dbadmin/scripts/analyze_stats.sh
    vsql -Atc "SELECT DISTINCT 'SELECT analyze_statistics(''' || p.projection_schema || '.' || p.projection_basename || ''');' run_these FROM projections p JOIN projection_storage ps USING (projection_id) WHERE NOT p.has_statistics AND NOT p.is_aggregate_projection AND ps.row_count > 1;" | vsql -a
    
  • @Jim_Knicely How frequently will the statistics become stale? I ran the select analyze_statistics() for once manually about 3 days ago, and i have more that 750M records loaded into db everyday but as of today i can still see 'statistics_type' column in 'projection_columns' table is still having value of 'Full'

    Also read somewhere that running analyze_statistics() unnecessarily on the table increases catalog size, is that true?

  • marcothesanemarcothesane - Select Field - Administrator

    The info 'FULL' for the statistics_type means just one of the possible values: 'FULL','ROWCOUNT','NONE'. That also means that 'FULL' statistics can very well be stale.

    Imagine you have a TIMESTAMP column named creation_ts, DEFAULT CURRENT_TIMESTAMP. If you ran ANALYZE_STATISTICS() on that column yesterday and today you inserted more rows - the statistics on that column will be full, but still, every query asking for today's data , say, WHERE creation_ts >= TRUNC(CURRENT_TIMESTAMP,'DD') AND creation_ts < TRUNC(CURRENT_TIMESTAMP,'DD') + 1 will search for data for which no statistics have yet been gathered. For that type of query, the table/column's statistics would be too stale to allow for an efficient query plan, no matter what the statistics_type for that column may be.

This discussion has been closed.