Options

System table table_statistics is missing column "table_schema"

Sergey_Cherepan_1Sergey_Cherepan_1 ✭✭✭
edited April 2023 in General Discussion

Hi,

System table table_statistics is missing column "table_schema" in v 12.0.3.

Here is explanation why do I care about missing column in system table.

Doing some random query tuning, I found that small table is reporting as having no statistics, or PREDICATE VALUE OUT OF RANGE.

Investigation show, that small table is reloaded daily, by truncating and inserting full data.
It appears that table truncate is invalidating table statistics, and I completely agree with it. I advised user to analyse table after full data reload.

Now there is a question how to detect tables that has invalidated statistics, to make sure all ETL are properly collect statistics. Fortunately, Vertica has system table table_statistics. It contains number of rows. If no statistics has been collected, or table has been truncated, number of rows is set to 1.

That would be perfect solution to find all tables with invalidated statistics.

Unfortunately, system table table_statistics is missing column "table_schema" and has only table_name. That makes detection of tables with invalidated statistics slightly inconvenient, some tables have same name in several schemes.

Can you take a look in system view definition, and post here query using internal views, that will show missing table_schema column.

May be, you can just write SQL that is using internal views that will show tables where statistics has been invalidated. That is actually what is needed. Probably a good idea in next Vertica release to add bool column to table_statistics - invalidated_statistics.

Thank you
Sergey

Answers

  • Options
    Bryan_HBryan_H Vertica Employee Administrator

    "table_statistics" is a view, you can get the definition by running:
    select * from vs_system_views where view_name = 'table_statistics';
    then write a view that includes schema field from vs_tables.
    I will log a bug for this because you are correct that it is common to have same table name in different schemas (for staging, dev, etc.) so this view should also expose a unique ID - probably should include table OID (primary key) and schema.

  • Options

    Can you consider add to bug one more request - add bool column that will show if stats was invalidated, due to truncate etc?

  • Options

    As expected, creating view with schema name is trivial.
    On other side, internal system views does not have field that will show invalidated stats on table. That seems to be not as trivial.

  • Options

    For reference, this is SQL to show tables with invalidated and not collected statistics:

    select t.table_schema, t.table_name, ps.row_cnt as table_row_count,
    case when ls.row_count = 1 then 'Invalidated' else 'Not collected' end as status,
    ls.stat_collection_time, t.create_time
    from tables t
    join (select anchor_table_id, sum(row_count)//2 row_cnt
    from projection_storage
    group by anchor_table_id) ps
    on ps.anchor_table_id = t.table_id
    join "v_internal"."vs_logical_statistics" ls
    on ls.table_oid = t.table_id
    where (ls.row_count = 1
    and ps.row_cnt > 1)
    or ls.row_count = -1;

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file