Understanding the difference between Encoding and Compression in practical way

UJJWAL_RANA · July 2016

Hi

I wanted to understand the difference Encoding and Compression in practical way i.e.wanted to understand how it works. Any help will be highly appreciated.

Any help will be highly appreciated

Thank You

Ujjwal Rana

eli_revach · July 2016

Hi
To simply that i would say :
Encoding - is comprision based on the data distibution ,and similarity

Comprision - simple binary comprision without take in advance the type of the data .

Vertica can filter data on top of encoding data , however he must uncompress the data befor he able to filter data

Thanks

UJJWAL_RANA · July 2016

Hi Eli

Can you give me one query based example for ENCODING and COMPRESSION

With Regards

Ujjwal

eli_revach · July 2016

Sure , Let’s take the DELTAVAL as an example:

Vertica definition for DELTAVAL is “data is recorded as a difference from the smallest value in the data block” - let assumes your data block include sequential continues number .

let say , your data block include the below list of values :

1000,1004,1007,1008,1009 , with DELTAVAL Vertica will store something like this -> 1000, 4,3,1,1 , you can see it as kind of compression , but its not binary compression its more of applicable compression .

On top this applicable compression , Vertica will used its internal binary compression (binary compression can be gzip ) , at the end you end up with two layers of compression that provide very good compression ratio .

In the query time , Vertica is able to know in which block the data is to be exists and able to decompress from binary mode only the relevant blocks needed for the query , in addition part of the execution operators can still run on the encoded data . So getting good compression and optimized query execution in both hands

I hope its more clear now

Thanks

FiliN · July 2016

Predicates on columns whose data is encoded in certain formats, such as run-length encoding, can be applied directly on the encoded data, thus bypassing the overhead of decompression and reducing the quantity of data to be copied and processed through the joins. This is particularly advantageous for low cardinality columns that are encoded.

We're Moving!

Create My New Community Account Now

Understanding the difference between Encoding and Compression in practical way

Comments

Leave a Comment