Hash and Re Hash



I need help , we are looking to use hash function to our bigest text .


For example 


select 'VerticaGM'as Name, hash('VerticaGM') as Name_Value;


Result :-


Name         Name_Value

VerticaGM 4870610383584265370


can I rehash 4870610383584265370 to get VerticaGM ?


Thank you.


  • Options

      Hash functions are unidirectional, that is they are not be reversible.


    you need ot use other functions 

  • Options

    what are other functions or UDF available?


    I appreciate for your help.

  • Options

     There is an encryption pkg at 


    - this might doing the trick for you if you are trying to hyde it. 


      If you don`t wanna have it in your result sets becouse is to big then you can choose the relation aproach and create Key to it instead of having it draged around your most queried table.

  • Options

    Do you have built in function's to hash and re hash.


    Our problem we have very large text column as PK , charter joins are very costly .


    So we are looking to hash it , so Integer joins are faster . At same time we need to rehash it to orginal value .


    Ex : Vertica hash value is 5605021174313838236

           If I rehase 5605021174313838236 = Vertica


    I'm looking for funcation to do this.



  • Options

    You can easily create projection that used an expression , the projection will include both ,  your original column and the hash based column , then you can explicitly used this projection for your queries :




    CREATE PROJECTION my_prg (month, sales) AS 
    SELECT hash(sales) AS month_hash, sales AS sales
    FROM monthly_sales ;


    I hope you will find it useful



  • Options

     ok  ,

    Why not adding a new default column to hold the hash for the text field.


    create table huge_text
    text LONG VARCHAR(100000),
    idhash int default hash(text)
    -- tbl2
    create table huge_text2
    text LONG VARCHAR(100000),
    idhash int default hash(text)


    Insert some data :

    insert into huge_text(text) values ('The weather tomorrow will be cold 
    and rainy and then on the day after, the sun..... 90k more character');

    See the data:

    select * from huge_text2;

    810386705167248200 | The weather tomorrow will be cold and rainy

    See the data in both tables: 

    select idhash,text from huge_text
    union all
    select idhash,text from huge_text2;
    810386705167248200 | The weather tomorrow will be cold a......
    810386705167248200 | The weather tomorrow will be cold a......

    Make you idhash primary key:

    ALTER TABLE   huge_text ADD PRIMARY KEY (idhash)

    Make fk on idhaah of the tbl 2 :

    ALTER TABLE public.huge_text2 ADD CONSTRAINT huge_text2_fk1 
    FOREIGN KEY (idhash) references public.huge_text (idhash);

    Your model is optimized now :) , i guess 


     Also adapt your projection build based on your query requirments

  • Options

    Thank you , Much appricated!


    Does hash value will change if we add more nodes 


    Ex : we have 30 nodes , we updated to 60 nodes

  • Options

     Not sure i understood your qustion ? 


    Nodes = Vertica Cluster Nodes ? 


      If you add new nodes to the database should and wont influence or change any of the columns and table definition,

     When adding more nodes to your Vertica cluster you need to rebalance the cluster. This is a different task and won't affect any of the hash default column definiton.



  • Options

    Can you cross check , I See this issue with other MPP platfrom's.


    yes , Nodes = Vertica Cluster Nodes.

  • Options


     You mean to tell me that by altering the infrastructure (adding or removing nodes) will affect the default values stored in a column of a table ?

      I have never encounterd any similar situation that this happened. 


     Try to reproduce this and i will belive it.


     It might be that when you have added the new node you haven't rpelicated the proejctions across all nodes, or look at your projection description where you have custom segmentation rules and one projection sits only in node A,B, and C out of A,B,C,D,E,F. becouse you decided so. 


    See this page for more info:




  • Options

    Thank you for your info ,I will try in my dev system adding new nodes and re run hash function .


    I will let you know,



  • Options
    Navin_CNavin_C Vertica Customer

    Hash function for a string will always be unique.


    try these, they are all different.
    select hash('abc')
    select hash('ABC')
    select hash('cba')



    String Hash function provided by Vertica is meant for external use by developers, its not used by Vertica internal operations. So the distribution of the data using hash function is different that the one used above.


    So you can hash your values and store it, maybe even use it as an dictionary(don't do it if  many values and will be used as dimension).



Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file