Options

Why some projections have compression Ratio less < 1

Problem:

We generated the Vertica compression output using "collect_diag_dump.sh  -c". However there are some projections for which the compressed size (Vertica (MB)) is greater than raw size.

Please see output below. I have highlighted the relevant columns for easy viewing.

TableName PjCnt PjType RowCount RawSize(MB) Vertica(MB) Compr.Ratio
=============================================================================================================================================^[[0m
temp.dwh_fact_invites_rtm_guid 3 S 145985594 46809.04 53595.23 0.87
temp.dwh_fact_invites_client_Type 3 S 145985594 46809.60 52706.82 0.89
temp.dwh_fact_day_id 3 S 145985594 46808.90 52610.92 0.89
temp.dwh_fact_invites_suc_ct 3 S 145985594 46810.16 51368.85 0.91
temp.dwh_fact_mult 3 S 145985594 46808.35 48434.29 0.97


Solution:

The column PjCnt in output above means projection count for corresponding tables. In this particular case, each table had 3 projections (1 super and 2 buddies).

The compression ratio is between Raw Size of the table and Vertica size of all projections on that table. Raw size calculated only one time where as Vertica size is the sum of all projections storage size. As there are 3 projection on those table, Vertica size on those tables increased slight more than raw size and compression ratio is less than 1. 

This is expected. If we keep adding more projection on a table, Vertica size grows where as Raw size remains same and compress ration decreases.

Comments

  • Options
    Useful information.  Thanks.

    If the compressed size is greater than raw size, how the customer will view it?  it will not increase total cost of ownership for the customer? 
  • Options
    Vertica licensing is currently a function of raw data size.  It is not affected by compression, nor by the number of projections you have on the same data.

    If you choose to create more projections, you will need to store those projections, so you will of course need more disk space.  This can increase TCO in that you may need to install more hard drives in your cluster.  But if you already have enough disk space, there's no other cost increase.
  • Options
    Thanks Adam.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file