Fine-grained control over compressions used for some encoding types

newbieguy · May 2020

I'm new to Vertica and attempting to understand how it handles encoding and compression of column data. Specifically, I'm interested in learning about whether or not it's possible to exactly control the encoding type and compression type for a column. For example, suppose that I created the following table and inserted some data:

CREATE TABLE test_table (company VARCHAR(90), order_count INT);
CREATE PROJECTION test_table_super (company, order_count ENCODING RLE) 
                  AS SELECT company, order_count FROM test_table;
INSERT INTO test_table VALUES ('BigCompany', 823);

If I then take a look at the compressions and encodings fields, I find:

select column_name, encodings, compressions from v_monitor.column_storage
                  where projection_name = 'test_table_super';

 column_name | encodings | compressions
-------------+-----------+--------------
 company     | String    | lzo
 order_count | RLE       | lzo
 epoch       | Int_Delta | none

The order_count field has RLE as its encoding and lzo as its compression. Would it be possible to force that column to use the bzip2 compression algorithm instead? Or maybe gzip even? Or is that kind of precise control just not exposed to the user?

Thanks!!

LenoyJ · May 2020

Something like?

dbadmin=> select get_config_parameter('SystemDefaultCompressionType');
get_config_parameter
----------------------
LZO
(1 row)
dbadmin=> select set_config_parameter('SystemDefaultCompressionType', 'GZIP');
    set_config_parameter
----------------------------
 Parameter set successfully
(1 row)

dbadmin=> CREATE TABLE test_table (company VARCHAR(90), order_count INT);
CREATE TABLE
dbadmin=> CREATE PROJECTION test_table_super (company, order_count ENCODING RLE)
dbadmin->                   AS SELECT company, order_count FROM test_table;
CREATE PROJECTION
dbadmin=> INSERT INTO test_table VALUES ('BigCompany', 823);
 OUTPUT
--------
      1
(1 row)

dbadmin=> select column_name, encodings, compressions from v_monitor.column_storage
dbadmin->                   where projection_name = 'test_table_super';
 column_name | encodings | compressions
-------------+-----------+--------------
 company     | String    | gzip
 order_count | RLE       | gzip
 epoch       | Int_Delta | none
 company     | String    | gzip
 order_count | RLE       | gzip
 epoch       | Int_Delta | none

newbieguy · May 2020

Thanks so much for this suggestion. I thought that there would be something to allow changing the type of compression used.
This system parameter seems to apply to the entire database, right? Or does it apply to all databases in the cluster?
Out of curiosity, is it possible to control the compression algorithm used on a more fine-grained level? Such as having some columns in a table compressed with LZO and some with GZIP?

LenoyJ · May 2020

It's rare to see more than one (production) database on one cluster. But yeah the setting applies to the database. Maybe the other experts can chime in if there's a more fine-grained way. But, I'm more interested in what you're trying to achieve. There are many benchmarks out there that compare LZO vs GZIP vs BZIP2, and LZO is the better of the three that provides very fast decompression speeds (which in turn translates to more query performance) and that's probably why it's set as the default.

newbieguy · May 2020

Regarding what I'm trying to accomplish, I am mainly just trying to understand what things are possible with Vertica. If it were possible to select different compressions for different fields, then it might be possible to do things like using ZSTD_FAST_COMP for some fields while keeping the default of LZO for other fields. I admit that I'm uncertain about whether or not this would actually be valuable.

I think that part of the reason why the usefulness of this capability may be limited is due to the fact that there are some "encoding types" that have both an encoding and a compression (like RLE), and a lot that don't. If you have a VARCHAR field, then the encoding is just "String" and then there is also a compression that you can specify, so there wouldn't be much use for any more fine-grained control there. If you have an INT field that has the DELTAVAL encoding type, then the encoding is listed as "Int_Delta" and the compression is "none". No idea if any form of compression is even a possibility there. Anyway, trying to understand this stuff is just part of the design process for enormous data volumes, where any tiny optimization is pursued aggressively.

We're Moving!

Create My New Community Account Now

Fine-grained control over compressions used for some encoding types

Answers

Leave a Comment