Options

Compression ratio calculation including copies of data

Everywhere I've read the calculation of compression ratio per table across cluster seems to be done this way:

data usage in disk per table / raw data per table. This result is consistent with the compression ratio reported by "collect_diag_dump.sh -c"

 

data usage in disk being calculated as:

 

SELECT anchor_table_name,SUM(ros_used_bytes) FROM projection_storage WHERE anchor_table_name='tableX' GROUP BY anchor_table_name;

 

and raw data usage per table as:

 

SELECT AUDIT('tableX');

 

The problem I see with this calculation is that finding compressed data usage per table this way is also including all projection copies while the calculation of raw data as done by the AUDIT doesn't include copies.

 

My question is:

 

Why is compression ratio calculation taking into account copies of data for compressed but not for raw data?

If I want to know how much is my data compressed, shouldn't the nominator and denomitor of the equation count the same data (i.e. either add copies or not)?

 

Thanks

 

 

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file