Hash segmentation to node number mapping
Hi,
Does anyone can help to explain how vertica maps node_name with the Hash segmentation?
In the following example, i created a dummy table and inserted few rows in it. Then, we can see that all the rows are not equally distributed between all nodes. Distribution of data across node happens according to HASH(ID) value.
I need to understand how vertica maps/decides that which hash value will be stored in which node?
Sample Table:
dbadmin=> CREATE TABLE checkout (
dbadmin(> id INT ENCODING RLE,
dbadmin(> transaction_timestamp TIMESTAMP ENCODING DELTAVAL,
dbadmin(> transaction_amount DOUBLE PRECISION
dbadmin(> )
dbadmin-> ORDER BY transaction_timestamp
dbadmin-> SEGMENTED BY hash(id) ALL NODES;
CREATE TABLE
dbadmin=> select count(*) from checkout;
count
-------
16
(1 row)
dbadmin=> select local_node_name(), count(*) from checkout group by local_node_name();
local_node_name | count
-----------------+-------
v_vdb_node0001 | 5
v_vdb_node0002 | 9
v_vdb_node0003 | 2
(3 rows)
dbadmin=> select id, hash(id), transaction_timestamp, transaction_amount from checkout order by 1;
id | hash | transaction_timestamp | transaction_amount
----+---------------------+----------------------------+--------------------
1 | 5783548743464686114 | 2018-11-27 04:13:41.157426 | 100
2 | 1618211815126016456 | 2018-11-29 04:13:45.306561 | 200
3 | 3883506187811235133 | 2018-11-30 04:14:02.686752 | 300
4 | 5130139942120141781 | 2018-12-01 04:14:08.806913 | 400
5 | 8299427032879314813 | 2018-12-02 04:14:14.25534 | 500
5 | 8299427032879314813 | 2018-12-02 04:17:34.520367 | 500
6 | 3939990413061782359 | 2018-12-03 04:17:34.551429 | 600
7 | 3110550038591389753 | 2018-12-04 04:17:34.580134 | 700
8 | 9046158759674539325 | 2018-12-05 04:17:34.608005 | 800
9 | 6898810103076931951 | 2018-12-06 04:17:34.636077 | 900
10 | 1909575676721759820 | 2018-12-07 04:17:34.664295 | 1000
11 | 2646627613614353023 | 2018-12-08 04:17:34.692378 | 1100
12 | 6042820943135784349 | 2018-12-09 04:17:34.720272 | 1200
13 | 2568111303865075193 | 2018-12-10 04:17:34.748703 | 1300
14 | 4970994696257251037 | 2018-12-11 04:17:34.7767 | 1400
15 | 6850683294198565620 | 2018-12-12 04:17:36.528584 | 1500
(16 rows)
0

Comments
Hi Raghav, @RaghavA
I know this is an old post, but just wanted to give you some pointers that helped me a lot:
1. https://github.com/jackghm/Vertica/wiki/Vertica-Node-Data-Distribution
2. https://github.com/jackghm/Vertica/wiki/HP-Vertica-Tips,-Tricks,-and-Best-Practices
3. https://15721.courses.cs.cmu.edu/spring2019/papers/15-execution/shrinivas-icde2013.pdf
Hope these help!
Have fun with Vertica!