I though of that But you cannot UNION ALL a VARCHAR with an INT.
dbadmin=> \d a
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+---------+------+------+---------+----------+-------------+-------------
public | a | user_id | int | 8 | | f | f |
(1 row)
dbadmin=> \d b
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+---------+-------------+------+---------+----------+-------------+-------------
public | b | user_id | varchar(10) | 10 | | f | f |
(1 row)
dbadmin=> SELECT * FROM a;
user_id
---------
26712
(1 row)
dbadmin=> SELECT * FROM b;
user_id
---------
26712
(1 row)
dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a;
user_id | hash
---------+---------------------
26712 | 6622807832315002947
(1 row)
dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b;
user_id | hash
---------+---------------------
26712 | 3488463230331277936
(1 row)
dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a
dbadmin-> UNION ALL
dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b;
ERROR 3429: For 'UNION', types int and varchar are inconsistent
dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b
dbadmin-> UNION ALL
dbadmin-> SELECT user_id, hash(user_id, 'a') FROM a;
ERROR 3429: For 'UNION', types varchar and int are inconsistent
DETAIL: Columns: user_id and user_id
Maybe the client tool is doing the conversion?
dbadmin=> SELECT user_id::VARCHAR, hash(user_id, 'a') FROM a
dbadmin-> UNION ALL
dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b;
user_id | hash
---------+---------------------
26712 | 6622807832315002947
26712 | 3488463230331277936
To illustrate what Jim and Bryan were mentioning:
Here's a test I just ran:
\x
SELECT
HASH(26712 ) AS hash_INT
, HASH(26712::NUMERIC(18) ) AS hash_NUMERIC_18
, HASH(26712::NUMERIC(32) ) AS hash_NUMERIC_32
, HASH(26712::NUMERIC(18, 4)) AS hash_NUMERIC_18_4
, HASH(26712::FLOAT ) AS hash_FLOAT
, HASH(26712::VARCHAR(256) ) AS hash_VARCHAR_256
, HASH(26712::VARCHAR(64) ) AS hash_VARCHAR_64
, HASH(26712::VARCHAR(5) ) AS hash_VARCHAR_5
, HASH(26712::CHAR(256) ) AS hash_CHAR_256
, HASH(26712::CHAR(5) ) AS hash_CHAR_5
;
-- out Expanded display is on.
-- out -[ RECORD 1 ]-----+--------------------
-- out hash_INT | 734013455889912858
-- out hash_NUMERIC_18 | 734013455889912858
-- out hash_NUMERIC_32 | 4839844074090391799
-- out hash_NUMERIC_18_4 | 6298167399894197585
-- out hash_FLOAT | 7382987690975279314
-- out hash_VARCHAR_256 | 438488188464369242
-- out hash_VARCHAR_64 | 438488188464369242
-- out hash_VARCHAR_5 | 438488188464369242
-- out hash_CHAR_256 | 438488188464369242
-- out hash_CHAR_5 | 438488188464369242
-- out
-- out Time: First fetch (1 row): 7.563 ms. All rows formatted: 7.594 ms
So it's not only the value , but also its type, and sometimes also the size and precision, that you're hashing, that makes the difference....
@hanans: Can you share with us the exact type of the columns in the table definition of the columns that you're hashing?
I went through it all again, hashing with user_id and 'a':
\x
SELECT
HASH(26712 , 'a') AS hash_with_INT
, HASH(26712::NUMERIC(18) , 'a') AS hash_with_NUMERIC_18
, HASH(26712::NUMERIC(32) , 'a') AS hash_with_NUMERIC_32
, HASH(26712::NUMERIC(18, 4), 'a') AS hash_with_NUMERIC_18_4
, HASH(26712::FLOAT , 'a') AS hash_with_FLOAT
, HASH(26712::VARCHAR(256) , 'a') AS hash_with_VARCHAR_256
, HASH(26712::VARCHAR(64) , 'a') AS hash_with_VARCHAR_64
, HASH(26712::VARCHAR(5) , 'a') AS hash_with_VARCHAR_5
, HASH(26712::CHAR(256) , 'a') AS hash_with_CHAR_256
, HASH(26712::CHAR(5) , 'a') AS hash_with_CHAR_5
;
-- out Expanded display is on.
-- out -[ RECORD 1 ]----------+--------------------
-- out hash_with_INT | 6622807832315002947
-- out hash_with_NUMERIC_18 | 6622807832315002947
-- out hash_with_NUMERIC_32 | 2877367361208514378
-- out hash_with_NUMERIC_18_4 | 8063475701858845121
-- out hash_with_FLOAT | 304723032549234641
-- out hash_with_VARCHAR_256 | 3488463230331277936
-- out hash_with_VARCHAR_64 | 3488463230331277936
-- out hash_with_VARCHAR_5 | 3488463230331277936
-- out hash_with_CHAR_256 | 3488463230331277936
-- out hash_with_CHAR_5 | 3488463230331277936
If we look at hanans's output, his 1st and 2nd output row in the report, with 2877367361208514378 as the hash value must come from dwh.sm_user_profile, where user_id must be between a NUMERIC(19,n) to a NUMERIC(37,n), and the 3rd output row must come from agg.agg_sm_daily_users_stats_by_platform, where user_id is either an INT or between a NUMERIC(1) and a NUMERIC(18).
Comments
That seems highly unlikely.
dbadmin=> CREATE TABLE a (user_id INT); CREATE TABLE dbadmin=> CREATE TABLE b (user_id INT); CREATE TABLE dbadmin=> INSERT INTO a SELECT 26712; OUTPUT -------- 1 (1 row) dbadmin=> INSERT INTO b SELECT 26712; OUTPUT -------- 1 (1 row) dbadmin=> SELECT user_id, HASH(user_id), HASH(user_id, 'a') FROM a dbadmin-> UNION ALL dbadmin-> SELECT user_id, HASH(user_id), HASH(user_id, 'a') FROM b; user_id | HASH | HASH ---------+--------------------+--------------------- 26712 | 734013455889912858 | 6622807832315002947 26712 | 734013455889912858 | 6622807832315002947 (2 rows)What client tool are you using? Try running your query in vsql.
Is user_id the same data type in both tables? I get one result as integer, and a different result as varchar:
I though of that
But you cannot UNION ALL a VARCHAR with an INT.
dbadmin=> \d a List of Fields by Tables Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key --------+-------+---------+------+------+---------+----------+-------------+------------- public | a | user_id | int | 8 | | f | f | (1 row) dbadmin=> \d b List of Fields by Tables Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key --------+-------+---------+-------------+------+---------+----------+-------------+------------- public | b | user_id | varchar(10) | 10 | | f | f | (1 row) dbadmin=> SELECT * FROM a; user_id --------- 26712 (1 row) dbadmin=> SELECT * FROM b; user_id --------- 26712 (1 row) dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a; user_id | hash ---------+--------------------- 26712 | 6622807832315002947 (1 row) dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b; user_id | hash ---------+--------------------- 26712 | 3488463230331277936 (1 row) dbadmin=> SELECT user_id, hash(user_id, 'a') FROM a dbadmin-> UNION ALL dbadmin-> SELECT user_id, hash(user_id, 'a') FROM b; ERROR 3429: For 'UNION', types int and varchar are inconsistent dbadmin=> SELECT user_id, hash(user_id, 'a') FROM b dbadmin-> UNION ALL dbadmin-> SELECT user_id, hash(user_id, 'a') FROM a; ERROR 3429: For 'UNION', types varchar and int are inconsistent DETAIL: Columns: user_id and user_idMaybe the client tool is doing the conversion?
To illustrate what Jim and Bryan were mentioning:
Here's a test I just ran:
So it's not only the value , but also its type, and sometimes also the size and precision, that you're hashing, that makes the difference....
@hanans: Can you share with us the exact type of the columns in the table definition of the columns that you're hashing?
Heh heh. Must be a mix of data types that Vertica is auto coercing allowing the UNION to succeed
dbadmin=> SELECT hash(26712::FLOAT, 'a') dbadmin-> UNION ALL dbadmin-> SELECT hash(26712::INT, 'a'); hash --------------------- 304723032549234641 6622807832315002947 (2 rows) dbadmin=> SELECT hash(26712::FLOAT, 'a') dbadmin-> UNION ALL dbadmin-> SELECT hash(26712::VARCHAR, 'a') ; hash --------------------- 304723032549234641 3488463230331277936 (2 rows) dbadmin=> SELECT hash(26712::NUMERIC, 'a') dbadmin-> UNION ALL dbadmin-> SELECT hash(26712::FLOAT, 'a'); hash --------------------- 6622807832315002947 304723032549234641 (2 rows)I went through it all again, hashing with
user_idand'a':If we look at hanans's output, his 1st and 2nd output row in the report, with 2877367361208514378 as the hash value must come from
dwh.sm_user_profile, whereuser_idmust be between a NUMERIC(19,n) to a NUMERIC(37,n), and the 3rd output row must come fromagg.agg_sm_daily_users_stats_by_platform, whereuser_idis either an INT or between a NUMERIC(1) and a NUMERIC(18).wow!you guys are awesome!