We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Hash function is not reproducible — Vertica Forum

Hash function is not reproducible

Running hash twice on two tables gives different result

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited July 2019

    That seems highly unlikely.

    dbadmin=> CREATE TABLE a (user_id INT);
    CREATE TABLE
    
    dbadmin=> CREATE TABLE b (user_id INT);
    CREATE TABLE
    
    dbadmin=> INSERT INTO a SELECT 26712;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> INSERT INTO b SELECT 26712;
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> SELECT user_id, HASH(user_id), HASH(user_id, 'a') FROM a
    dbadmin-> UNION ALL
    dbadmin-> SELECT user_id, HASH(user_id), HASH(user_id, 'a') FROM b;
     user_id |        HASH        |        HASH
    ---------+--------------------+---------------------
       26712 | 734013455889912858 | 6622807832315002947
       26712 | 734013455889912858 | 6622807832315002947
    (2 rows)
    

    What client tool are you using? Try running your query in vsql.

  • Bryan_HBryan_H Vertica Employee Administrator

    Is user_id the same data type in both tables? I get one result as integer, and a different result as varchar:

    dbadmin=> select hash(26712, 'a');

    hash

    6622807832315002947
    (1 row)

    dbadmin=> select hash('26712', 'a');

    hash

    3488463230331277936
    (1 row)

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited July 2019

    I though of that :) But you cannot UNION ALL a VARCHAR with an INT.

    dbadmin=> \d a
                                    List of Fields by Tables
     Schema | Table | Column  | Type | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-------+---------+------+------+---------+----------+-------------+-------------
     public | a     | user_id | int  |    8 |         | f        | f           |
    (1 row)
    
    dbadmin=> \d b
                                        List of Fields by Tables
     Schema | Table | Column  |    Type     | Size | Default | Not Null | Primary Key | Foreign Key
    --------+-------+---------+-------------+------+---------+----------+-------------+-------------
     public | b     | user_id | varchar(10) |   10 |         | f        | f           |
    (1 row)
    
    dbadmin=> SELECT * FROM a;
     user_id
    ---------
       26712
    (1 row)
    
    dbadmin=> SELECT * FROM b;
     user_id
    ---------
     26712
    (1 row)
    
    dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a;
     user_id |        hash
    ---------+---------------------
       26712 | 6622807832315002947
    (1 row)
    
    dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b;
     user_id |        hash
    ---------+---------------------
     26712   | 3488463230331277936
    (1 row)
    
    dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a
    dbadmin-> UNION ALL
    dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b;
    ERROR 3429:  For 'UNION', types int and varchar are inconsistent
    
    dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b
    dbadmin-> UNION ALL
    dbadmin-> SELECT user_id, hash(user_id, 'a') FROM a;
    ERROR 3429:  For 'UNION', types varchar and int are inconsistent
    DETAIL:  Columns: user_id and user_id
    

    Maybe the client tool is doing the conversion?

    dbadmin=> SELECT user_id::VARCHAR, hash(user_id, 'a') FROM a
    dbadmin-> UNION ALL
    dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b;
     user_id |        hash
    ---------+---------------------
     26712   | 6622807832315002947
     26712   | 3488463230331277936
    
  • marcothesanemarcothesane - Select Field - Administrator

    To illustrate what Jim and Bryan were mentioning:
    Here's a test I just ran:

    \x
    SELECT
      HASH(26712                ) AS hash_INT
    , HASH(26712::NUMERIC(18)   ) AS hash_NUMERIC_18   
    , HASH(26712::NUMERIC(32)   ) AS hash_NUMERIC_32   
    , HASH(26712::NUMERIC(18, 4)) AS hash_NUMERIC_18_4
    , HASH(26712::FLOAT         ) AS hash_FLOAT         
    , HASH(26712::VARCHAR(256)  ) AS hash_VARCHAR_256  
    , HASH(26712::VARCHAR(64)   ) AS hash_VARCHAR_64
    , HASH(26712::VARCHAR(5)    ) AS hash_VARCHAR_5 
    , HASH(26712::CHAR(256)     ) AS hash_CHAR_256    
    , HASH(26712::CHAR(5)       ) AS hash_CHAR_5      
    ;
    -- out Expanded display is on.
    -- out -[ RECORD 1 ]-----+--------------------
    -- out hash_INT          | 734013455889912858
    -- out hash_NUMERIC_18   | 734013455889912858
    -- out hash_NUMERIC_32   | 4839844074090391799
    -- out hash_NUMERIC_18_4 | 6298167399894197585
    -- out hash_FLOAT        | 7382987690975279314
    -- out hash_VARCHAR_256  | 438488188464369242
    -- out hash_VARCHAR_64   | 438488188464369242
    -- out hash_VARCHAR_5    | 438488188464369242
    -- out hash_CHAR_256     | 438488188464369242
    -- out hash_CHAR_5       | 438488188464369242
    -- out 
    -- out Time: First fetch (1 row): 7.563 ms. All rows formatted: 7.594 ms
    

    So it's not only the value , but also its type, and sometimes also the size and precision, that you're hashing, that makes the difference....

    @hanans: Can you share with us the exact type of the columns in the table definition of the columns that you're hashing?

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited July 2019

    Heh heh. Must be a mix of data types that Vertica is auto coercing allowing the UNION to succeed :)

    dbadmin=> SELECT hash(26712::FLOAT, 'a')
    dbadmin-> UNION ALL
    dbadmin-> SELECT hash(26712::INT, 'a');
            hash
    ---------------------
      304723032549234641
     6622807832315002947
    (2 rows)
    
    dbadmin=> SELECT hash(26712::FLOAT, 'a')
    dbadmin-> UNION ALL
    dbadmin-> SELECT hash(26712::VARCHAR, 'a') ;
            hash
    ---------------------
      304723032549234641
     3488463230331277936
    (2 rows)
    
    dbadmin=> SELECT hash(26712::NUMERIC, 'a')
    dbadmin-> UNION ALL
    dbadmin-> SELECT hash(26712::FLOAT, 'a');
            hash
    ---------------------
     6622807832315002947
      304723032549234641
    (2 rows)
    
  • marcothesanemarcothesane - Select Field - Administrator

    I went through it all again, hashing with user_id and 'a':

    \x
    SELECT
      HASH(26712                , 'a') AS hash_with_INT
    , HASH(26712::NUMERIC(18)   , 'a') AS hash_with_NUMERIC_18   
    , HASH(26712::NUMERIC(32)   , 'a') AS hash_with_NUMERIC_32   
    , HASH(26712::NUMERIC(18, 4), 'a') AS hash_with_NUMERIC_18_4
    , HASH(26712::FLOAT         , 'a') AS hash_with_FLOAT         
    , HASH(26712::VARCHAR(256)  , 'a') AS hash_with_VARCHAR_256  
    , HASH(26712::VARCHAR(64)   , 'a') AS hash_with_VARCHAR_64
    , HASH(26712::VARCHAR(5)    , 'a') AS hash_with_VARCHAR_5 
    , HASH(26712::CHAR(256)     , 'a') AS hash_with_CHAR_256    
    , HASH(26712::CHAR(5)       , 'a') AS hash_with_CHAR_5      
    ;
    -- out Expanded display is on.
    -- out -[ RECORD 1 ]----------+--------------------
    -- out hash_with_INT          | 6622807832315002947
    -- out hash_with_NUMERIC_18   | 6622807832315002947
    -- out hash_with_NUMERIC_32   | 2877367361208514378
    -- out hash_with_NUMERIC_18_4 | 8063475701858845121
    -- out hash_with_FLOAT        | 304723032549234641
    -- out hash_with_VARCHAR_256  | 3488463230331277936
    -- out hash_with_VARCHAR_64   | 3488463230331277936
    -- out hash_with_VARCHAR_5    | 3488463230331277936
    -- out hash_with_CHAR_256     | 3488463230331277936
    -- out hash_with_CHAR_5       | 3488463230331277936
    

    If we look at hanans's output, his 1st and 2nd output row in the report, with 2877367361208514378 as the hash value must come from dwh.sm_user_profile, where user_id must be between a NUMERIC(19,n) to a NUMERIC(37,n), and the 3rd output row must come from agg.agg_sm_daily_users_stats_by_platform, where user_id is either an INT or between a NUMERIC(1) and a NUMERIC(18).

  • wow!you guys are awesome!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file