Vertica tip: are your columns too wide?

Bryan_HBryan_H Vertica Employee Administrator

This tip expands on the earlier post on encoding and compression at https://www.vertica.com/blog/checking-and-improving-column-compression-and-encoding/

When you have millions to billions of rows, data type becomes a bit more important: even an extra 10 bytes per row across a huge data set will impact storage or performance (or both!).

When I created the big_fact_table, I included some VARCHAR fields, but failed to set a width. Vertica defaults to VARCHAR(80) in this case. However, all of my strings are 8-character alpahnumerics, so why have VARCHAR(80) when VARCHAR(10) will suffice? I created a new table named "varchar10" with the same fields but altered the VARCHAR fields to VARCHAR(10) with ALTER TABLE...ALTER COLUMN...SET DATA TYPE VARCHAR(10).

This had almost no impact on table size thanks to compression and encoding:

dbadmin=> select anchor_table_name, projection_name, used_bytes from projection_storage where anchor_table_name like 'big_fact_table%' limit 5;
anchor_table_name | projection_name | used_bytes
-----------------------------+-----------------------------------------+------------
big_fact_table | big_fact_table_super | 6843970195
big_fact_table_varchar10 | big_fact_table_varchar10_super | 6843968398

BUT, when I load data with INSERT...SELECT, I see a significant performance improvement with the smaller VARCHAR:

dbadmin=> insert into public.big_fact_table select * from public.big1090;

OUTPUT

445580312
(1 row)
Time: First fetch (1 row): 380333.748 ms. All rows formatted: 380333.823 ms
dbadmin=> insert into public.big_fact_table_varchar10 select * from public.big1090;

OUTPUT

445580312
(1 row)
Time: First fetch (1 row): 269333.788 ms. All rows formatted: 269333.864 ms

That is around 25% faster! Why? During load and query, Vertica must allocate enough resources assuming the fields will actually contain the full data size. If you allocate too much memory, this will slow down sorts, segmentation, and so on as Vertica needs to move around data types that have lots of empty space. So optimizing data type and length helps Vertica optimize memory usage and run faster!

You can check whether you've allocated too much space in a VARCHAR or VARBINARY with SELECT MAX(LENGTH(<column_name>)) FROM <table_name>; and compare to the column DDL.

For more info and limitations: https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AdministratorsGuide/Tables/ColumnManagement/ReduceColumnWidth.htm

Comments

  • Anonymous_WombatAnonymous_Wombat Vertica Employee Employee

    This is an incredible help. Provides illustration and example as well as the way to check. Thank you!

Sign In or Register to comment.