Hash Function Result Depends on Data Type and Sometimes Size

Jim_KnicelyJim_Knicely - Select Field - Administrator

The Vertica HASH function calculates a hash value over the function arguments, producing a value in the range 0 <= x < 2^63. When calculating a hash value, the HASH function takes into consideration the data type and size of the original value.

Be careful when comparing hashed column values from different tables where the columns have different data types and size as they may not have the same hashed value.

Example:

dbadmin=> CREATE TABLE a (c INT);
CREATE TABLE

dbadmin=> INSERT INTO a SELECT 1;
OUTPUT
--------
      1
(1 row)

dbadmin=> CREATE TABLE b ( c NUMERIC(10, 1));
CREATE TABLE

dbadmin=> INSERT INTO b SELECT 1;
OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT COUNT(*) FROM a JOIN b ON b.c = a.c;
COUNT
-------
     1
(1 row)

dbadmin=> SELECT COUNT(*) FROM a JOIN b ON HASH(b.c) = HASH(a.c);
COUNT
-------
     0
(1 row)

dbadmin=> \x
Expanded display is on.

dbadmin=> SELECT HASH(12345::INT)             "12345::INT",
dbadmin->        HASH(12345::VARCHAR(10))     "12345::VARCHAR(10)",
dbadmin->        HASH(12345::VARCHAR(100))    "12345::VARCHAR(100)",
dbadmin->        HASH(12345::NUMERIC(100,0))  "12345::NUMERIC(100,0)",
dbadmin->        HASH(12345::NUMERIC(100,5))  "12345::NUMERIC(100,5)",
dbadmin->        HASH(12345::NUMERIC(100,10)) "12345::NUMERIC(100,10)",
dbadmin->        HASH(12345::FLOAT)           "12345::FLOAT";
-[ RECORD 1 ]----------+--------------------
12345::INT             | 4960063907862199528
12345::VARCHAR(10)     | 7521734626139061666
12345::VARCHAR(100)    | 7521734626139061666
12345::NUMERIC(100,0)  | 3327062938535469805
12345::NUMERIC(100,5)  | 6342074168025654472
12345::NUMERIC(100,10) | 560125968008200468
12345::FLOAT           | 4397895841065323663

Helpful Links:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Mathematical/HASH.htm

Have fun!

Sign In or Register to comment.