catalog size reduction

Hi,
i am checking catalog size in two ways:
1) vertica> select object_count from dc_catalog_info;
2) > du ./Catalog/*
I created several thousands of tables and object_count achieved value 1.7 million.
Catalog on disc reported:
2.4G Catalog/Checkpoints
1.2G Catalog/Txnlogs
3.5G total
Then i dropped most of the tables and object_count dropped to 140 thousands.
But catalog on disc reports still the same 3.5G.
It's single node deployment if that's relevant.

How can i reduce catalog size ?

I tried to manually run "do_tm_task" and "make_ahm_now" but nothing changed.
I believe there is no problem with stale checkpoints or anything like that:
vertica=> select * from system;-[ RECORD 1 ]------------+-------------
current_epoch            | 381165
ahm_epoch                | 380962
last_good_epoch          | 381154
refresh_epoch            | -1
designed_fault_tolerance | 0
node_count               | 1
node_down_count          | 0
current_fault_tolerance  | 0
catalog_revision_number  | 1096258
wos_used_bytes           | 704512
wos_row_count            | 17000
ros_used_bytes           | 227036930483
ros_row_count            | 5117798502
total_used_bytes         | 227037634995
total_row_count          | 5117815502
Vladimir

Comments

  • Catalog is driven primarily by the count of Tables, Projections and Analyze Statistics ( number of columns).  In our environment each of our schemas has 1100 tables with a high ratio (80%) of unsegmented projections which increases the number of projections by node count. 

    Running analyze statistics by table (i.e all columns) greatly increases catalog size.

    What we are doing is several fold
    1. Increase the number of segmented projections
    2. Run analyze statistics on table.column only for columns in joins, where clauses, group by
    3. Splitting large cluster node count into multiple physical clusters
    Jim
  • Hi Jim,
    thanks for comment.
    I didn't realize analyze_statistics has such a big impact on catalog size.
    We will definitely tune analyze_statistics in way we will should columns not used in join, where and group clauses.
    Thanks for that!
    However i wonder how i can fix the state where i got into state catalog is too big and performance is getting worse.
    Vladimir
  • Hi

     

    I see similiar issue being reported at one of the client, where catalog size has become too large for couple of nodes in a cluster and the solution was to restart the database to release any memory. This incident seems to have happended multiple times in the past. 

     

    Ideally, catalog should be same across all the nodes in the cluster, I couldn't think of instances where only couple of nodes might see this issue unless there is difference in RAM allocated. 

     

    However with version 7.2.x, changes are done to reduce catalog size by consolidating statistics storage, removing unused statistics, and storing unsegmented projection metadata once per database, instead of once per node. Though I'm not clear on where unsegmented projection metadata is being stored if it has to be once per database.

     

    could there any other reason why catalog size grows exponentially and doesn't reduce? even if you try to reduce the metadata size, does this mean restart of the database is required to release the extra memory it was holding? 

     

    Any idea

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file