Hash Function Result Depends on Data Type and Sometimes Size
Jim_Knicely
- Select Field - Administrator
The Vertica HASH function calculates a hash value over the function arguments, producing a value in the range 0 <= x < 2^63. When calculating a hash value, the HASH function takes into consideration the data type and size of the original value.
Be careful when comparing hashed column values from different tables where the columns have different data types and size as they may not have the same hashed value.
Example:
dbadmin=> CREATE TABLE a (c INT); CREATE TABLE dbadmin=> INSERT INTO a SELECT 1; OUTPUT -------- 1 (1 row) dbadmin=> CREATE TABLE b ( c NUMERIC(10, 1)); CREATE TABLE dbadmin=> INSERT INTO b SELECT 1; OUTPUT -------- 1 (1 row) dbadmin=> SELECT COUNT(*) FROM a JOIN b ON b.c = a.c; COUNT ------- 1 (1 row) dbadmin=> SELECT COUNT(*) FROM a JOIN b ON HASH(b.c) = HASH(a.c); COUNT ------- 0 (1 row) dbadmin=> \x Expanded display is on. dbadmin=> SELECT HASH(12345::INT) "12345::INT", dbadmin-> HASH(12345::VARCHAR(10)) "12345::VARCHAR(10)", dbadmin-> HASH(12345::VARCHAR(100)) "12345::VARCHAR(100)", dbadmin-> HASH(12345::NUMERIC(100,0)) "12345::NUMERIC(100,0)", dbadmin-> HASH(12345::NUMERIC(100,5)) "12345::NUMERIC(100,5)", dbadmin-> HASH(12345::NUMERIC(100,10)) "12345::NUMERIC(100,10)", dbadmin-> HASH(12345::FLOAT) "12345::FLOAT"; -[ RECORD 1 ]----------+-------------------- 12345::INT | 4960063907862199528 12345::VARCHAR(10) | 7521734626139061666 12345::VARCHAR(100) | 7521734626139061666 12345::NUMERIC(100,0) | 3327062938535469805 12345::NUMERIC(100,5) | 6342074168025654472 12345::NUMERIC(100,10) | 560125968008200468 12345::FLOAT | 4397895841065323663
Helpful Links:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Mathematical/HASH.htm
Have fun!
1