Column Level Encoding using GZIP_COMP
I am trying to reduce table size, so used DESIGNER_DESIGN_PROJECTION_ENCODINGS for recommendations. Along with that, in one of the column VARCHAR 200, I encoded using GZIP_COMP. Table size is reduced, but when I tried to select the column, the values are like this "���/�����E�s�4�)}�q4�L�\�2!,".
Could you please let me know if there is any technique to fetch the column values properly.
0
Comments
Hi,
I don't see a problem in Vertica 9.0.0-1:
Looks fine
Are those non-UTF8 characters in your table? What does the ISUTF8() function show for you?
Example:
Hi Jim,
We are using Vertica Analytic Database v7.2.3-4. And, the column is not a UTF8 and the isUTF8() shows false for this column.
I'm guessing that because the data is not UTF8 it's showing up like that. That is, whatever client tool you are using can't display those characters so they'll show up as funky question marks.
See:
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/ImplementLocalesForInternationalDataSets.htm
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/BestPracticesForWorkingWithLocales.htm
http://vertica-forums.com/viewtopic.php?f=3&t=1030&hilit=isutf8
I just tested GZIP encoding in test table as you did and it worked.
Where as in other table, ISUTF8() shows true before encoding and false after encoding.
I didn't insert any record after applying encoding and this was an existing table where I tried to apply encoding. Is it because of this?
Even the above scenario works with test table.
The column holds only numbers and not any other language
Junk values are shown when an existing projection has been altered with GZIP_COMP encoding . It works fine when a table is created with new projection and inserting values from the other table.
I tested with a table of size 77GB and applied GZIP in 8 columns where characters are greater than 50. Now the table size is reduced to 69GB. I am curious to reduce the size of the table even more by using encoding in all VARCHAR columns.
Is it advisable to use GZIP_COMP encoding in all VARCHAR fields? even though it is not recommended by DESIGNER_DESIGN_PROJECTION_ENCODINGS.
@Vertica_User - You said "Junk values are shown when an existing projection has been altered with GZIP_COMP encoding..."
How are you altering an existing projection's column to be GZIP_COMP encoded?
Re Why DBD doesn't recommend GZIP:
GZIP tends to consume a significant amount of CPU - it's likely to impact load and query performance. Historically, this performance cost has been more than the average customer has been willing to pay for higher compression. But your circumstances may be different than the average customer.
Hi Jim,
Please find below steps,
1. Create new projection PR_NEW with encodings for table A
2. select refresh('
Ok. Fyi ... In Vertica 9.0.0-1, I do not see an issue with having data in a table and then changing it to use GZIP_COMP...
Example:
Is it a problem with this version Vertica Analytic Database v7.2.3-4. We are using this version