Hash and Re Hash

VerticaMG · February 2016

Hi,

I need help , we are looking to use hash function to our bigest text .

For example

select 'VerticaGM'as Name, hash('VerticaGM') as Name_Value;

Result :-

Name Name_Value

VerticaGM 4870610383584265370

can I rehash 4870610383584265370 to get VerticaGM ?

Thank you.

Adrian_Oprea_1 · February 2016

Hash functions are unidirectional, that is they are not be reversible.

you need ot use other functions

VerticaMG · February 2016

what are other functions or UDF available?

I appreciate for your help.

Adrian_Oprea_1 · February 2016

There is an encryption pkg at

https://github.com/vertica/Vertica-Extension-Packages/tree/master/encryption_package

- this might doing the trick for you if you are trying to hyde it.

If you don`t wanna have it in your result sets becouse is to big then you can choose the relation aproach and create Key to it instead of having it draged around your most queried table.

VerticaMG · February 2016

Do you have built in function's to hash and re hash.

Our problem we have very large text column as PK , charter joins are very costly .

So we are looking to hash it , so Integer joins are faster . At same time we need to rehash it to orginal value .

Ex : Vertica hash value is 5605021174313838236

If I rehase 5605021174313838236 = Vertica

I'm looking for funcation to do this.

eli_revach · February 2016

Hi
You can easily create projection that used an expression , the projection will include both , your original column and the hash based column , then you can explicitly used this projection for your queries :

Eg:

CREATE PROJECTION my_prg (month, sales) AS 
   SELECT hash(sales) AS month_hash, sales AS sales
   FROM monthly_sales ;

I hope you will find it useful

Thanks

Adrian_Oprea_1 · February 2016

ok ,

Why not adding a new default column to hold the hash for the text field.

create table huge_text
 (
 text LONG VARCHAR(100000),
 idhash int default hash(text)
 )
;
-- tbl2
create table huge_text2
 (
 text LONG VARCHAR(100000),
 idhash int default hash(text)
 )

Insert some data &colon;

insert into huge_text(text) values ('The weather tomorrow will be cold 
and rainy and then on the day after, the sun..... 90k more character');

See the data&colon;

select * from huge_text2;

 810386705167248200 | The weather tomorrow will be cold and rainy

See the data in both tables:

select idhash,text from huge_text
 union all 
select idhash,text from huge_text2;
-----------------------------------------------
 810386705167248200 | The weather tomorrow will be cold a......
 810386705167248200 | The weather tomorrow will be cold a......

Make you idhash primary key:

ALTER TABLE   huge_text ADD PRIMARY KEY (idhash)

Make fk on idhaah of the tbl 2 :

ALTER TABLE public.huge_text2 ADD CONSTRAINT huge_text2_fk1 
FOREIGN KEY (idhash) references public.huge_text (idhash);

Your model is optimized now , i guess

Also adapt your projection build based on your query requirments

VerticaMG · February 2016

Thank you , Much appricated!

Does hash value will change if we add more nodes

Ex : we have 30 nodes , we updated to 60 nodes

Adrian_Oprea_1 · February 2016

Not sure i understood your qustion ?

Nodes = Vertica Cluster Nodes ?

If you add new nodes to the database should and wont influence or change any of the columns and table definition,

When adding more nodes to your Vertica cluster you need to rebalance the cluster. This is a different task and won't affect any of the hash default column definiton.

VerticaMG · February 2016

Can you cross check , I See this issue with other MPP platfrom's.

yes , Nodes = Vertica Cluster Nodes.

Adrian_Oprea_1 · February 2016

You mean to tell me that by altering the infrastructure (adding or removing nodes) will affect the default values stored in a column of a table ?

I have never encounterd any similar situation that this happened.

Try to reproduce this and i will belive it.

It might be that when you have added the new node you haven't rpelicated the proejctions across all nodes, or look at your projection description where you have custom segmentation rules and one projection sits only in node A,B, and C out of A,B,C,D,E,F. becouse you decided so.

See this page for more info:

https://community.dev.hpe.com/t5/Vertica-Knowledge-Base/Understanding-Rebalancing-Part-1-What-Happens-During-Rebalancing/ta-p/231792

VerticaMG · February 2016

Thank you for your info ,I will try in my dev system adding new nodes and re run hash function .

I will let you know,

Navin_C · February 2016

Hash function for a string will always be unique.

try these, they are all different.
select hash('abc')
select hash('ABC')
select hash('cba')

String Hash function provided by Vertica is meant for external use by developers, its not used by Vertica internal operations. So the distribution of the data using hash function is different that the one used above.

So you can hash your values and store it, maybe even use it as an dictionary(don't do it if many values and will be used as dimension).

We're Moving!

Create My New Community Account Now

Hash and Re Hash

Comments

Leave a Comment