Understanding the difference between Encoding and Compression in practical way
Hi
I wanted to understand the difference Encoding and Compression in practical way i.e.wanted to understand how it works. Any help will be highly appreciated.
Any help will be highly appreciated
Thank You
Ujjwal Rana
0
Comments
To simply that i would say :
Encoding - is comprision based on the data distibution ,and similarity
Comprision - simple binary comprision without take in advance the type of the data .
Vertica can filter data on top of encoding data , however he must uncompress the data befor he able to filter data
Thanks
Hi Eli
Can you give me one query based example for ENCODING and COMPRESSION
With Regards
Ujjwal
Sure , Let’s take the DELTAVAL as an example:
Vertica definition for DELTAVAL is “data is recorded as a difference from the smallest value in the data block” - let assumes your data block include sequential continues number .
let say , your data block include the below list of values :
1000,1004,1007,1008,1009 , with DELTAVAL Vertica will store something like this -> 1000, 4,3,1,1 , you can see it as kind of compression , but its not binary compression its more of applicable compression .
On top this applicable compression , Vertica will used its internal binary compression (binary compression can be gzip ) , at the end you end up with two layers of compression that provide very good compression ratio .
In the query time , Vertica is able to know in which block the data is to be exists and able to decompress from binary mode only the relevant blocks needed for the query , in addition part of the execution operators can still run on the encoded data . So getting good compression and optimized query execution in both hands
I hope its more clear now
Thanks
Predicates on columns whose data is encoded in certain formats, such as run-length encoding, can be applied directly on the encoded data, thus bypassing the overhead of decompression and reducing the quantity of data to be copied and processed through the joins. This is particularly advantageous for low cardinality columns that are encoded.