Column Level Encoding using GZIP_COMP

I am trying to reduce table size, so used DESIGNER_DESIGN_PROJECTION_ENCODINGS for recommendations. Along with that, in one of the column VARCHAR 200, I encoded using GZIP_COMP. Table size is reduced, but when I tried to select the column, the values are like this "���/�����E�s�4�)}�q4�L�\�2!,".
Could you please let me know if there is any technique to fetch the column values properly.

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017

    Hi,

    I don't see a problem in Vertica 9.0.0-1:

    dbadmin=> create table test_gzip_comp (col1 varchar(200));
    CREATE TABLE
    
    dbadmin=> create projection test_gzip_comp_pr (col1 encoding gzip_comp) as select col1 from test_gzip_comp;
    CREATE PROJECTION
    
    dbadmin=> insert into test_gzip_comp select 'How does this look?';
    OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> select col1 from test_gzip_comp_pr;
            col1
    ---------------------
    How does this look?
    (1 row)
    

    Looks fine :)

    Are those non-UTF8 characters in your table? What does the ISUTF8() function show for you?

    Example:

    dbadmin=> select isutf8(col1) from test_gzip_comp;
     isutf8
    --------
     t
    (1 row)
    
  • Hi Jim,
    We are using Vertica Analytic Database v7.2.3-4. And, the column is not a UTF8 and the isUTF8() shows false for this column.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017
  • I just tested GZIP encoding in test table as you did and it worked.
    Where as in other table, ISUTF8() shows true before encoding and false after encoding.

    I didn't insert any record after applying encoding and this was an existing table where I tried to apply encoding. Is it because of this?

    Even the above scenario works with test table.

  • The column holds only numbers and not any other language

  • Junk values are shown when an existing projection has been altered with GZIP_COMP encoding . It works fine when a table is created with new projection and inserting values from the other table.

    I tested with a table of size 77GB and applied GZIP in 8 columns where characters are greater than 50. Now the table size is reduced to 69GB. I am curious to reduce the size of the table even more by using encoding in all VARCHAR columns.

    Is it advisable to use GZIP_COMP encoding in all VARCHAR fields? even though it is not recommended by DESIGNER_DESIGN_PROJECTION_ENCODINGS.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017

    @Vertica_User - You said "Junk values are shown when an existing projection has been altered with GZIP_COMP encoding..."

    How are you altering an existing projection's column to be GZIP_COMP encoded?

  • Re Why DBD doesn't recommend GZIP:
    GZIP tends to consume a significant amount of CPU - it's likely to impact load and query performance. Historically, this performance cost has been more than the average customer has been willing to pay for higher compression. But your circumstances may be different than the average customer.

  • Hi Jim,
    Please find below steps,
    1. Create new projection PR_NEW with encodings for table A
    2. select refresh('

    '); 3.select wait_for_ahm_now(); 4.DROP PROJECTION PR_OLD CASCADE; 5. ALTER PROJECTION PR_NEW RENAME TO PR_OLD;
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Ok. Fyi ... In Vertica 9.0.0-1, I do not see an issue with having data in a table and then changing it to use GZIP_COMP...

    Example:

    dbadmin=> create table test_gzip_comp (col1 varchar(200));
    CREATE TABLE
    
    dbadmin=> insert into test_gzip_comp select 'How does this look?';
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> create projection test_gzip_comp_pr (col1 encoding gzip_comp) as select col1 from test_gzip_comp;
    WARNING 4468:  Projection <public.test_gzip_comp_pr> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
              The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
    CREATE PROJECTION
    
    dbadmin=> select refresh('test_gzip_comp');
                                                                                                                                                        refresh                                                
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Refresh completed with the following outcomes:
    Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
    ----------------------------------------------------------------------------------------
    "public"."test_gzip_comp_pr": [test_gzip_comp] [refreshed] [scratch] [0] [0]
    
    (1 row)
    
    dbadmin=> select make_ahm_now();
             make_ahm_now
    -------------------------------
     AHM set (New AHM Epoch: 3446)
    (1 row)
    
    dbadmin=> select projection_name, is_super_projection from projections where anchor_table_name  = 'test_gzip_comp';
       projection_name    | is_super_projection
    ----------------------+---------------------
     test_gzip_comp_super | t
     test_gzip_comp_pr    | t
    (2 rows)
    
    dbadmin=> drop projection test_gzip_comp_super;
    DROP PROJECTION
    
    dbadmin=> alter  projection test_gzip_comp_pr rename to test_gzip_comp_super;
    ALTER PROJECTION
    
    dbadmin=> select col1, isutf8(col1) from test_gzip_comp;
            col1         | isutf8
    ---------------------+--------
     How does this look? | t
    (1 row)
    
    dbadmin=> select export_objects('','test_gzip_comp');
                                                                                                                                                              export_objects                                   
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    CREATE TABLE public.test_gzip_comp
    (
        col1 varchar(200)
    );
    
    
    CREATE PROJECTION public.test_gzip_comp_super
    (
     col1 ENCODING GZIP_COMP
    )
    AS
     SELECT test_gzip_comp.col1
     FROM public.test_gzip_comp
     ORDER BY test_gzip_comp.col1
    SEGMENTED BY hash(test_gzip_comp.col1) ALL NODES OFFSET 0;
    
    
    SELECT MARK_DESIGN_KSAFE(0);
    
    (1 row)
    
  • Is it a problem with this version Vertica Analytic Database v7.2.3-4. We are using this version

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file