I though of that But you cannot UNION ALL a VARCHAR with an INT.
dbadmin=> \d a
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+---------+------+------+---------+----------+-------------+-------------
public | a | user_id | int | 8 | | f | f |
(1 row)
dbadmin=> \d b
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+---------+-------------+------+---------+----------+-------------+-------------
public | b | user_id | varchar(10) | 10 | | f | f |
(1 row)
dbadmin=> SELECT * FROM a;
user_id
---------
26712
(1 row)
dbadmin=> SELECT * FROM b;
user_id
---------
26712
(1 row)
dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a;
user_id | hash
---------+---------------------
26712 | 6622807832315002947
(1 row)
dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b;
user_id | hash
---------+---------------------
26712 | 3488463230331277936
(1 row)
dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a
dbadmin-> UNION ALL
dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b;
ERROR 3429: For 'UNION', types int and varchar are inconsistent
dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b
dbadmin-> UNION ALL
dbadmin-> SELECT user_id, hash(user_id, 'a') FROM a;
ERROR 3429: For 'UNION', types varchar and int are inconsistent
DETAIL: Columns: user_id and user_id
Maybe the client tool is doing the conversion?
dbadmin=> SELECT user_id::VARCHAR, hash(user_id, 'a') FROM a
dbadmin-> UNION ALL
dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b;
user_id | hash
---------+---------------------
26712 | 6622807832315002947
26712 | 3488463230331277936
To illustrate what Jim and Bryan were mentioning:
Here's a test I just ran:
\x
SELECT
HASH(26712 ) AS hash_INT
, HASH(26712::NUMERIC(18) ) AS hash_NUMERIC_18
, HASH(26712::NUMERIC(32) ) AS hash_NUMERIC_32
, HASH(26712::NUMERIC(18, 4)) AS hash_NUMERIC_18_4
, HASH(26712::FLOAT ) AS hash_FLOAT
, HASH(26712::VARCHAR(256) ) AS hash_VARCHAR_256
, HASH(26712::VARCHAR(64) ) AS hash_VARCHAR_64
, HASH(26712::VARCHAR(5) ) AS hash_VARCHAR_5
, HASH(26712::CHAR(256) ) AS hash_CHAR_256
, HASH(26712::CHAR(5) ) AS hash_CHAR_5
;
-- out Expanded display is on.
-- out -[ RECORD 1 ]-----+--------------------
-- out hash_INT | 734013455889912858
-- out hash_NUMERIC_18 | 734013455889912858
-- out hash_NUMERIC_32 | 4839844074090391799
-- out hash_NUMERIC_18_4 | 6298167399894197585
-- out hash_FLOAT | 7382987690975279314
-- out hash_VARCHAR_256 | 438488188464369242
-- out hash_VARCHAR_64 | 438488188464369242
-- out hash_VARCHAR_5 | 438488188464369242
-- out hash_CHAR_256 | 438488188464369242
-- out hash_CHAR_5 | 438488188464369242
-- out
-- out Time: First fetch (1 row): 7.563 ms. All rows formatted: 7.594 ms
So it's not only the value , but also its type, and sometimes also the size and precision, that you're hashing, that makes the difference....
@hanans: Can you share with us the exact type of the columns in the table definition of the columns that you're hashing?
I went through it all again, hashing with user_id and 'a':
\x
SELECT
HASH(26712 , 'a') AS hash_with_INT
, HASH(26712::NUMERIC(18) , 'a') AS hash_with_NUMERIC_18
, HASH(26712::NUMERIC(32) , 'a') AS hash_with_NUMERIC_32
, HASH(26712::NUMERIC(18, 4), 'a') AS hash_with_NUMERIC_18_4
, HASH(26712::FLOAT , 'a') AS hash_with_FLOAT
, HASH(26712::VARCHAR(256) , 'a') AS hash_with_VARCHAR_256
, HASH(26712::VARCHAR(64) , 'a') AS hash_with_VARCHAR_64
, HASH(26712::VARCHAR(5) , 'a') AS hash_with_VARCHAR_5
, HASH(26712::CHAR(256) , 'a') AS hash_with_CHAR_256
, HASH(26712::CHAR(5) , 'a') AS hash_with_CHAR_5
;
-- out Expanded display is on.
-- out -[ RECORD 1 ]----------+--------------------
-- out hash_with_INT | 6622807832315002947
-- out hash_with_NUMERIC_18 | 6622807832315002947
-- out hash_with_NUMERIC_32 | 2877367361208514378
-- out hash_with_NUMERIC_18_4 | 8063475701858845121
-- out hash_with_FLOAT | 304723032549234641
-- out hash_with_VARCHAR_256 | 3488463230331277936
-- out hash_with_VARCHAR_64 | 3488463230331277936
-- out hash_with_VARCHAR_5 | 3488463230331277936
-- out hash_with_CHAR_256 | 3488463230331277936
-- out hash_with_CHAR_5 | 3488463230331277936
If we look at hanans's output, his 1st and 2nd output row in the report, with 2877367361208514378 as the hash value must come from dwh.sm_user_profile, where user_id must be between a NUMERIC(19,n) to a NUMERIC(37,n), and the 3rd output row must come from agg.agg_sm_daily_users_stats_by_platform, where user_id is either an INT or between a NUMERIC(1) and a NUMERIC(18).
Comments
That seems highly unlikely.
What client tool are you using? Try running your query in vsql.
Is user_id the same data type in both tables? I get one result as integer, and a different result as varchar:
I though of that But you cannot UNION ALL a VARCHAR with an INT.
Maybe the client tool is doing the conversion?
To illustrate what Jim and Bryan were mentioning:
Here's a test I just ran:
So it's not only the value , but also its type, and sometimes also the size and precision, that you're hashing, that makes the difference....
@hanans: Can you share with us the exact type of the columns in the table definition of the columns that you're hashing?
Heh heh. Must be a mix of data types that Vertica is auto coercing allowing the UNION to succeed
I went through it all again, hashing with
user_id
and'a'
:If we look at hanans's output, his 1st and 2nd output row in the report, with 2877367361208514378 as the hash value must come from
dwh.sm_user_profile
, whereuser_id
must be between a NUMERIC(19,n) to a NUMERIC(37,n), and the 3rd output row must come fromagg.agg_sm_daily_users_stats_by_platform
, whereuser_id
is either an INT or between a NUMERIC(1) and a NUMERIC(18).wow!you guys are awesome!