Options

Understanding the difference between Encoding and Compression in practical way

Hi

I wanted to understand the difference Encoding and Compression in practical way i.e.wanted to understand how it works. Any help will be highly appreciated.

 

Any help will be highly appreciated

 

Thank You

 

Ujjwal Rana

Comments

  • Options
    Hi
    To simply that i would say :
    Encoding - is comprision based on the data distibution ,and similarity

    Comprision - simple binary comprision without take in advance the type of the data .

    Vertica can filter data on top of encoding data , however he must uncompress the data befor he able to filter data

    Thanks
  • Options

    Hi Eli

    Can you give me one query based example for ENCODING and COMPRESSION

     

    With Regards

     

    Ujjwal

  • Options

    Sure , Let’s take the DELTAVAL as an example:

     

    Vertica  definition for  DELTAVAL is “data is recorded as a difference from the smallest value in the data block”  - let assumes your data block include sequential continues number .

    let say , your data block include the below list of values :

    1000,1004,1007,1008,1009 , with DELTAVAL Vertica will store something like this -> 1000, 4,3,1,1  , you can see it as kind of compression , but its not binary compression its more of applicable compression .

     

    On top this applicable compression  , Vertica will used its internal binary compression (binary compression  can be gzip )    , at the end you end up  with two layers  of compression that provide very good compression ratio  .

     

    In  the query  time , Vertica is able to know in which block the data is to be exists and able to decompress from binary mode   only the relevant blocks needed for the query  , in addition part of the execution  operators can still run on the encoded data  .  So getting good compression and optimized query execution  in both hands

     

     

    I hope its more clear now

     

    Thanks

  • Options

    Predicates on columns whose data is encoded in certain formats, such as run-length encoding, can be applied directly on the encoded data, thus bypassing the overhead of decompression and reducing the quantity of data to be copied and processed through the joins. This is particularly advantageous for low cardinality columns that are encoded.

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file