Hash segmentation to node number mapping
Hi,
Does anyone can help to explain how vertica maps node_name with the Hash segmentation?
In the following example, i created a dummy table and inserted few rows in it. Then, we can see that all the rows are not equally distributed between all nodes. Distribution of data across node happens according to HASH(ID) value.
I need to understand how vertica maps/decides that which hash value will be stored in which node?
Sample Table: dbadmin=> CREATE TABLE checkout ( dbadmin(> id INT ENCODING RLE, dbadmin(> transaction_timestamp TIMESTAMP ENCODING DELTAVAL, dbadmin(> transaction_amount DOUBLE PRECISION dbadmin(> ) dbadmin-> ORDER BY transaction_timestamp dbadmin-> SEGMENTED BY hash(id) ALL NODES; CREATE TABLE dbadmin=> select count(*) from checkout; count ------- 16 (1 row) dbadmin=> select local_node_name(), count(*) from checkout group by local_node_name(); local_node_name | count -----------------+------- v_vdb_node0001 | 5 v_vdb_node0002 | 9 v_vdb_node0003 | 2 (3 rows) dbadmin=> select id, hash(id), transaction_timestamp, transaction_amount from checkout order by 1; id | hash | transaction_timestamp | transaction_amount ----+---------------------+----------------------------+-------------------- 1 | 5783548743464686114 | 2018-11-27 04:13:41.157426 | 100 2 | 1618211815126016456 | 2018-11-29 04:13:45.306561 | 200 3 | 3883506187811235133 | 2018-11-30 04:14:02.686752 | 300 4 | 5130139942120141781 | 2018-12-01 04:14:08.806913 | 400 5 | 8299427032879314813 | 2018-12-02 04:14:14.25534 | 500 5 | 8299427032879314813 | 2018-12-02 04:17:34.520367 | 500 6 | 3939990413061782359 | 2018-12-03 04:17:34.551429 | 600 7 | 3110550038591389753 | 2018-12-04 04:17:34.580134 | 700 8 | 9046158759674539325 | 2018-12-05 04:17:34.608005 | 800 9 | 6898810103076931951 | 2018-12-06 04:17:34.636077 | 900 10 | 1909575676721759820 | 2018-12-07 04:17:34.664295 | 1000 11 | 2646627613614353023 | 2018-12-08 04:17:34.692378 | 1100 12 | 6042820943135784349 | 2018-12-09 04:17:34.720272 | 1200 13 | 2568111303865075193 | 2018-12-10 04:17:34.748703 | 1300 14 | 4970994696257251037 | 2018-12-11 04:17:34.7767 | 1400 15 | 6850683294198565620 | 2018-12-12 04:17:36.528584 | 1500 (16 rows)
0
Comments
Hi Raghav, @RaghavA
I know this is an old post, but just wanted to give you some pointers that helped me a lot:
1. https://github.com/jackghm/Vertica/wiki/Vertica-Node-Data-Distribution
2. https://github.com/jackghm/Vertica/wiki/HP-Vertica-Tips,-Tricks,-and-Best-Practices
3. https://15721.courses.cs.cmu.edu/spring2019/papers/15-execution/shrinivas-icde2013.pdf
Hope these help!
Have fun with Vertica!