Confirming if a Data Load was Apportioned

When running a COPY command, a parser can use more than one database node to load a single input source in parallel. This approach is referred to as apportioned load and is often much faster than loading from a single node.

One way of confirming that a data load took advantage of apportioning is to query the LOAD_SOURCES system table after running the COPY command. If the load was apportioned, you’ll get a list of nodes that participated!

Example:

[dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "SELECT node_name, node_address FROM nodes;"
Timing is off.
    node_name    |  node_address
-----------------+----------------

v_mydb_node0001 | 192.168.61.227
v_mydb_node0002 | 192.168.61.228
v_mydb_node0003 | 192.168.61.229
(3 rows)

[dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.227 "ls -lrt /home/dbadmin/data2.txt"
-rw-r--r-- 1 dbadmin verticadba 184500036 Dec 11 22:01 /home/dbadmin/data2.txt

[dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.228 "ls -lrt /home/dbadmin/data2.txt"
-rw-r--r-- 1 dbadmin verticadba 184500036 Dec 11 22:03 /home/dbadmin/data2.txt

[dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.229 "ls -lrt /home/dbadmin/data2.txt"
-rw-r--r-- 1 dbadmin verticadba 184500036 Dec 11 22:03 /home/dbadmin/data2.txt

[dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "COPY big_fact2 FROM '/home/dbadmin/data2.txt' ON ANY NODE; -- This load will be apportioned"
Timing is off.
Rows Loaded
-------------
     1500000
(1 row)

[dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "SELECT node_name, source_name, input_size FROM load_sources WHERE table_name = 'big_fact2';"
Timing is off.
    node_name    |       source_name       | input_size
-----------------+-------------------------+------------
v_mydb_node0001 | /home/dbadmin/data2.txt |   30750006
v_mydb_node0001 | /home/dbadmin/data2.txt |   30750006
v_mydb_node0002 | /home/dbadmin/data2.txt |   30750006
v_mydb_node0002 | /home/dbadmin/data2.txt |   30750006
v_mydb_node0003 | /home/dbadmin/data2.txt |   30750006
v_mydb_node0003 | /home/dbadmin/data2.txt |   30750006
(6 rows)

Note that the original file size on each node is 184,500,036 bytes. Vertica split the load up into 6 pieces each 30,750,006 bytes in size.

[dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "SELECT 184500036 / 30750006 apportioned_pieces;"
Timing is off.
  apportioned_pieces
----------------------
6.000000000000000000
(1 row)

Helpful Links:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/ExtendingVertica/UDx/UDL/ApportionedLoad.htm
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/LOAD_SOURCES.htm

Have fun!

Sign In or Register to comment.