Hash and Re Hash

Hi,

 

I need help , we are looking to use hash function to our bigest text .

 

For example 

 

select 'VerticaGM'as Name, hash('VerticaGM') as Name_Value;

 

Result :-

 

Name         Name_Value

VerticaGM 4870610383584265370

 

can I rehash 4870610383584265370 to get VerticaGM ?

 

Thank you.

Comments

  •   Hash functions are unidirectional, that is they are not be reversible.

     

    you need ot use other functions 

  • what are other functions or UDF available?

     

    I appreciate for your help.

  •  There is an encryption pkg at 

    https://github.com/vertica/Vertica-Extension-Packages/tree/master/encryption_package

    - this might doing the trick for you if you are trying to hyde it. 

     

      If you don`t wanna have it in your result sets becouse is to big then you can choose the relation aproach and create Key to it instead of having it draged around your most queried table.

  • Do you have built in function's to hash and re hash.

     

    Our problem we have very large text column as PK , charter joins are very costly .

     

    So we are looking to hash it , so Integer joins are faster . At same time we need to rehash it to orginal value .

     

    Ex : Vertica hash value is 5605021174313838236

           If I rehase 5605021174313838236 = Vertica

     

    I'm looking for funcation to do this.

     

     

  • Hi
    You can easily create projection that used an expression , the projection will include both ,  your original column and the hash based column , then you can explicitly used this projection for your queries :

     

    Eg:

     

    CREATE PROJECTION my_prg (month, sales) AS 
    SELECT hash(sales) AS month_hash, sales AS sales
    FROM monthly_sales ;

     

    I hope you will find it useful

     

    Thanks 

  •  ok  ,

    Why not adding a new default column to hold the hash for the text field.

     

    create table huge_text
    (
    text LONG VARCHAR(100000),
    idhash int default hash(text)
    )
    ;
    -- tbl2
    create table huge_text2
    (
    text LONG VARCHAR(100000),
    idhash int default hash(text)
    )

     

    Insert some data :

    insert into huge_text(text) values ('The weather tomorrow will be cold 
    and rainy and then on the day after, the sun..... 90k more character');

    See the data:

    select * from huge_text2;

    810386705167248200 | The weather tomorrow will be cold and rainy

    See the data in both tables: 

    select idhash,text from huge_text
    union all
    select idhash,text from huge_text2;
    -----------------------------------------------
    810386705167248200 | The weather tomorrow will be cold a......
    810386705167248200 | The weather tomorrow will be cold a......

    Make you idhash primary key:

    ALTER TABLE   huge_text ADD PRIMARY KEY (idhash)

    Make fk on idhaah of the tbl 2 :

    ALTER TABLE public.huge_text2 ADD CONSTRAINT huge_text2_fk1 
    FOREIGN KEY (idhash) references public.huge_text (idhash);

    Your model is optimized now :) , i guess 

     

     Also adapt your projection build based on your query requirments

  • Thank you , Much appricated!

     

    Does hash value will change if we add more nodes 

     

    Ex : we have 30 nodes , we updated to 60 nodes

  •  Not sure i understood your qustion ? 

     

    Nodes = Vertica Cluster Nodes ? 

     

      If you add new nodes to the database should and wont influence or change any of the columns and table definition,

     When adding more nodes to your Vertica cluster you need to rebalance the cluster. This is a different task and won't affect any of the hash default column definiton.

     

     

  • Can you cross check , I See this issue with other MPP platfrom's.

     

    yes , Nodes = Vertica Cluster Nodes.

  •  

     You mean to tell me that by altering the infrastructure (adding or removing nodes) will affect the default values stored in a column of a table ?

      I have never encounterd any similar situation that this happened. 

     

     Try to reproduce this and i will belive it.

     

     It might be that when you have added the new node you haven't rpelicated the proejctions across all nodes, or look at your projection description where you have custom segmentation rules and one projection sits only in node A,B, and C out of A,B,C,D,E,F. becouse you decided so. 

     

    See this page for more info:

    https://community.dev.hpe.com/t5/Vertica-Knowledge-Base/Understanding-Rebalancing-Part-1-What-Happens-During-Rebalancing/ta-p/231792

     

      

  • Thank you for your info ,I will try in my dev system adding new nodes and re run hash function .

     

    I will let you know,

     

     

  • Navin_CNavin_C Vertica Customer

    Hash function for a string will always be unique.

     

    try these, they are all different.
    select hash('abc')
    select hash('ABC')
    select hash('cba')

     

     

    String Hash function provided by Vertica is meant for external use by developers, its not used by Vertica internal operations. So the distribution of the data using hash function is different that the one used above.

     

    So you can hash your values and store it, maybe even use it as an dictionary(don't do it if  many values and will be used as dimension).

     

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file