Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Hash function is not reproducible

Running hash twice on two tables gives different result

Comments

  • Jim_KnicelyJim_Knicely Administrator
    edited July 2019

    That seems highly unlikely.

    dbadmin=> CREATE TABLE a (user_id INT);
    CREATE TABLE
    
    dbadmin=> CREATE TABLE b (user_id INT);
    CREATE TABLE
    
    dbadmin=> INSERT INTO a SELECT 26712;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> INSERT INTO b SELECT 26712;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> SELECT user_id, HASH(user_id), HASH(user_id, 'a') FROM a
    dbadmin-> UNION ALL
    dbadmin-> SELECT user_id, HASH(user_id), HASH(user_id, 'a') FROM b;
     user_id |        HASH        |        HASH
    ---------+--------------------+---------------------
       26712 | 734013455889912858 | 6622807832315002947
       26712 | 734013455889912858 | 6622807832315002947
    (2 rows)
    

    What client tool are you using? Try running your query in vsql.

  • Bryan_HBryan_H Administrator

    Is user_id the same data type in both tables? I get one result as integer, and a different result as varchar:

    dbadmin=> select hash(26712, 'a');

    hash

    6622807832315002947
    (1 row)

    dbadmin=> select hash('26712', 'a');

    hash

    3488463230331277936
    (1 row)

  • Jim_KnicelyJim_Knicely Administrator
    edited July 2019

    I though of that :) But you cannot UNION ALL a VARCHAR with an INT.

    dbadmin=> \d a
                                    List of Fields by Tables
     Schema | Table | Column  | Type | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-------+---------+------+------+---------+----------+-------------+-------------
     public | a     | user_id | int  |    8 |         | f        | f           |
    (1 row)
    
    dbadmin=> \d b
                                        List of Fields by Tables
     Schema | Table | Column  |    Type     | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-------+---------+-------------+------+---------+----------+-------------+-------------
     public | b     | user_id | varchar(10) |   10 |         | f        | f           |
    (1 row)
    
    dbadmin=> SELECT * FROM a;
     user_id
    ---------
       26712
    (1 row)
    
    dbadmin=> SELECT * FROM b;
     user_id
    ---------
     26712
    (1 row)
    
    dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a;
     user_id |        hash
    ---------+---------------------
       26712 | 6622807832315002947
    (1 row)
    
    dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b;
     user_id |        hash
    ---------+---------------------
     26712   | 3488463230331277936
    (1 row)
    
    dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a
    dbadmin-> UNION ALL
    dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b;
    ERROR 3429:  For 'UNION', types int and varchar are inconsistent
    
    dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b
    dbadmin-> UNION ALL
    dbadmin-> SELECT user_id, hash(user_id, 'a') FROM a;
    ERROR 3429:  For 'UNION', types varchar and int are inconsistent
    DETAIL:  Columns: user_id and user_id
    

    Maybe the client tool is doing the conversion?

    dbadmin=> SELECT user_id::VARCHAR, hash(user_id, 'a') FROM a
    dbadmin-> UNION ALL
    dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b;
     user_id |        hash
    ---------+---------------------
     26712   | 6622807832315002947
     26712   | 3488463230331277936
    
  • marcothesanemarcothesane Administrator

    To illustrate what Jim and Bryan were mentioning:
    Here's a test I just ran:

    \x
    SELECT
      HASH(26712                ) AS hash_INT
    , HASH(26712::NUMERIC(18)   ) AS hash_NUMERIC_18   
    , HASH(26712::NUMERIC(32)   ) AS hash_NUMERIC_32   
    , HASH(26712::NUMERIC(18, 4)) AS hash_NUMERIC_18_4
    , HASH(26712::FLOAT         ) AS hash_FLOAT         
    , HASH(26712::VARCHAR(256)  ) AS hash_VARCHAR_256  
    , HASH(26712::VARCHAR(64)   ) AS hash_VARCHAR_64
    , HASH(26712::VARCHAR(5)    ) AS hash_VARCHAR_5 
    , HASH(26712::CHAR(256)     ) AS hash_CHAR_256    
    , HASH(26712::CHAR(5)       ) AS hash_CHAR_5      
    ;
    -- out Expanded display is on.
    -- out -[ RECORD 1 ]-----+--------------------
    -- out hash_INT          | 734013455889912858
    -- out hash_NUMERIC_18   | 734013455889912858
    -- out hash_NUMERIC_32   | 4839844074090391799
    -- out hash_NUMERIC_18_4 | 6298167399894197585
    -- out hash_FLOAT        | 7382987690975279314
    -- out hash_VARCHAR_256  | 438488188464369242
    -- out hash_VARCHAR_64   | 438488188464369242
    -- out hash_VARCHAR_5    | 438488188464369242
    -- out hash_CHAR_256     | 438488188464369242
    -- out hash_CHAR_5       | 438488188464369242
    -- out 
    -- out Time: First fetch (1 row): 7.563 ms. All rows formatted: 7.594 ms
    

    So it's not only the value , but also its type, and sometimes also the size and precision, that you're hashing, that makes the difference....

    @hanans: Can you share with us the exact type of the columns in the table definition of the columns that you're hashing?

  • Jim_KnicelyJim_Knicely Administrator
    edited July 2019

    Heh heh. Must be a mix of data types that Vertica is auto coercing allowing the UNION to succeed :)

    dbadmin=> SELECT hash(26712::FLOAT, 'a')
    dbadmin-> UNION ALL
    dbadmin-> SELECT hash(26712::INT, 'a');
            hash
    ---------------------
      304723032549234641
     6622807832315002947
    (2 rows)
    
    dbadmin=> SELECT hash(26712::FLOAT, 'a')
    dbadmin-> UNION ALL
    dbadmin-> SELECT hash(26712::VARCHAR, 'a') ;
            hash
    ---------------------
      304723032549234641
     3488463230331277936
    (2 rows)
    
    dbadmin=> SELECT hash(26712::NUMERIC, 'a')
    dbadmin-> UNION ALL
    dbadmin-> SELECT hash(26712::FLOAT, 'a');
            hash
    ---------------------
     6622807832315002947
      304723032549234641
    (2 rows)
    
  • marcothesanemarcothesane Administrator

    I went through it all again, hashing with user_id and 'a':

    \x
    SELECT
      HASH(26712                , 'a') AS hash_with_INT
    , HASH(26712::NUMERIC(18)   , 'a') AS hash_with_NUMERIC_18   
    , HASH(26712::NUMERIC(32)   , 'a') AS hash_with_NUMERIC_32   
    , HASH(26712::NUMERIC(18, 4), 'a') AS hash_with_NUMERIC_18_4
    , HASH(26712::FLOAT         , 'a') AS hash_with_FLOAT         
    , HASH(26712::VARCHAR(256)  , 'a') AS hash_with_VARCHAR_256  
    , HASH(26712::VARCHAR(64)   , 'a') AS hash_with_VARCHAR_64
    , HASH(26712::VARCHAR(5)    , 'a') AS hash_with_VARCHAR_5 
    , HASH(26712::CHAR(256)     , 'a') AS hash_with_CHAR_256    
    , HASH(26712::CHAR(5)       , 'a') AS hash_with_CHAR_5      
    ;
    -- out Expanded display is on.
    -- out -[ RECORD 1 ]----------+--------------------
    -- out hash_with_INT          | 6622807832315002947
    -- out hash_with_NUMERIC_18   | 6622807832315002947
    -- out hash_with_NUMERIC_32   | 2877367361208514378
    -- out hash_with_NUMERIC_18_4 | 8063475701858845121
    -- out hash_with_FLOAT        | 304723032549234641
    -- out hash_with_VARCHAR_256  | 3488463230331277936
    -- out hash_with_VARCHAR_64   | 3488463230331277936
    -- out hash_with_VARCHAR_5    | 3488463230331277936
    -- out hash_with_CHAR_256     | 3488463230331277936
    -- out hash_with_CHAR_5       | 3488463230331277936
    

    If we look at hanans's output, his 1st and 2nd output row in the report, with 2877367361208514378 as the hash value must come from dwh.sm_user_profile, where user_id must be between a NUMERIC(19,n) to a NUMERIC(37,n), and the 3rd output row must come from agg.agg_sm_daily_users_stats_by_platform, where user_id is either an INT or between a NUMERIC(1) and a NUMERIC(18).

  • wow!you guys are awesome!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.