Column Level Encoding using GZIP_COMP
I am trying to reduce table size, so used DESIGNER_DESIGN_PROJECTION_ENCODINGS for recommendations. Along with that, in one of the column VARCHAR 200, I encoded using GZIP_COMP. Table size is reduced, but when I tried to select the column, the values are like this "���/�����E�s�4�)}�q4�L�\�2!,".
Could you please let me know if there is any technique to fetch the column values properly.
0
Comments
Hi,
I don't see a problem in Vertica 9.0.0-1:
dbadmin=> create table test_gzip_comp (col1 varchar(200)); CREATE TABLE dbadmin=> create projection test_gzip_comp_pr (col1 encoding gzip_comp) as select col1 from test_gzip_comp; CREATE PROJECTION dbadmin=> insert into test_gzip_comp select 'How does this look?'; OUTPUT -------- 1 (1 row) dbadmin=> select col1 from test_gzip_comp_pr; col1 --------------------- How does this look? (1 row)Looks fine
Are those non-UTF8 characters in your table? What does the ISUTF8() function show for you?
Example:
Hi Jim,
We are using Vertica Analytic Database v7.2.3-4. And, the column is not a UTF8 and the isUTF8() shows false for this column.
I'm guessing that because the data is not UTF8 it's showing up like that. That is, whatever client tool you are using can't display those characters so they'll show up as funky question marks.
See:
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/ImplementLocalesForInternationalDataSets.htm
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/BestPracticesForWorkingWithLocales.htm
http://vertica-forums.com/viewtopic.php?f=3&t=1030&hilit=isutf8
I just tested GZIP encoding in test table as you did and it worked.
Where as in other table, ISUTF8() shows true before encoding and false after encoding.
I didn't insert any record after applying encoding and this was an existing table where I tried to apply encoding. Is it because of this?
Even the above scenario works with test table.
The column holds only numbers and not any other language
Junk values are shown when an existing projection has been altered with GZIP_COMP encoding . It works fine when a table is created with new projection and inserting values from the other table.
I tested with a table of size 77GB and applied GZIP in 8 columns where characters are greater than 50. Now the table size is reduced to 69GB. I am curious to reduce the size of the table even more by using encoding in all VARCHAR columns.
Is it advisable to use GZIP_COMP encoding in all VARCHAR fields? even though it is not recommended by DESIGNER_DESIGN_PROJECTION_ENCODINGS.
@Vertica_User - You said "Junk values are shown when an existing projection has been altered with GZIP_COMP encoding..."
How are you altering an existing projection's column to be GZIP_COMP encoded?
Re Why DBD doesn't recommend GZIP:
GZIP tends to consume a significant amount of CPU - it's likely to impact load and query performance. Historically, this performance cost has been more than the average customer has been willing to pay for higher compression. But your circumstances may be different than the average customer.
Hi Jim,
Please find below steps,
1. Create new projection PR_NEW with encodings for table A
2. select refresh('
Ok. Fyi ... In Vertica 9.0.0-1, I do not see an issue with having data in a table and then changing it to use GZIP_COMP...
Example:
dbadmin=> create table test_gzip_comp (col1 varchar(200)); CREATE TABLE dbadmin=> insert into test_gzip_comp select 'How does this look?'; OUTPUT -------- 1 (1 row) dbadmin=> create projection test_gzip_comp_pr (col1 encoding gzip_comp) as select col1 from test_gzip_comp; WARNING 4468: Projection <public.test_gzip_comp_pr> is not available for query processing. Execute the select start_refresh() function to copy data into this projection. The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh CREATE PROJECTION dbadmin=> select refresh('test_gzip_comp'); refresh ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Refresh completed with the following outcomes: Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)] ---------------------------------------------------------------------------------------- "public"."test_gzip_comp_pr": [test_gzip_comp] [refreshed] [scratch] [0] [0] (1 row) dbadmin=> select make_ahm_now(); make_ahm_now ------------------------------- AHM set (New AHM Epoch: 3446) (1 row) dbadmin=> select projection_name, is_super_projection from projections where anchor_table_name = 'test_gzip_comp'; projection_name | is_super_projection ----------------------+--------------------- test_gzip_comp_super | t test_gzip_comp_pr | t (2 rows) dbadmin=> drop projection test_gzip_comp_super; DROP PROJECTION dbadmin=> alter projection test_gzip_comp_pr rename to test_gzip_comp_super; ALTER PROJECTION dbadmin=> select col1, isutf8(col1) from test_gzip_comp; col1 | isutf8 ---------------------+-------- How does this look? | t (1 row) dbadmin=> select export_objects('','test_gzip_comp'); export_objects ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- CREATE TABLE public.test_gzip_comp ( col1 varchar(200) ); CREATE PROJECTION public.test_gzip_comp_super ( col1 ENCODING GZIP_COMP ) AS SELECT test_gzip_comp.col1 FROM public.test_gzip_comp ORDER BY test_gzip_comp.col1 SEGMENTED BY hash(test_gzip_comp.col1) ALL NODES OFFSET 0; SELECT MARK_DESIGN_KSAFE(0); (1 row)Is it a problem with this version Vertica Analytic Database v7.2.3-4. We are using this version