In Vertica, the number of projections for a table nullifying the benefits of compression?

In Vertica, the number of projections for a table nullifying the benefits of compression?  is this really true?

Vertica articles/papers say that you will get 50 to 80 percent compression. How do you calculate this? 

Comments

  • Hi Mrao,

    We think those numbers are reasonable for typical data sets, but Vertica's compression levels are very data-dependent.  Some people will see much worse compression but some people will actually see much better compression.    Those numbers are typically for raw data size vs total Vertica disk utilization.

    If you want to know how your data will compress, go try it.  Load some of your data; then run the Database Designer (which will enable our more-advanced compression mechanisms, also it will create additional projections for your queries as needed).  Then keep loading data; see how much disk space you're using.

    Adam
  • It is also true that you can outweigh the benefits of compression by creating lots and lots of them. This is NOT a good practice in the same way secondary indexes are not liberally created in other databases to aid query performance.
    Usually 2 segmented projections suffice. Also avoid replicating projections for larger tables and you will gain compression benefits .

    In order to measure compr. ratio, you should be able to measure raw data byte size as well ( in bytes) and compare with vertica storage  ( can use df Linux command ) or projection_storage system table.

  • Thanks for the reply and that helps.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file