Confirming if a Data Load was Apportioned
When running a COPY command, a parser can use more than one database node to load a single input source in parallel. This approach is referred to as apportioned load and is often much faster than loading from a single node.
One way of confirming that a data load took advantage of apportioning is to query the LOAD_SOURCES system table after running the COPY command. If the load was apportioned, you’ll get a list of nodes that participated!
Example:
[dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "SELECT node_name, node_address FROM nodes;" Timing is off. node_name | node_address -----------------+---------------- v_mydb_node0001 | 192.168.61.227 v_mydb_node0002 | 192.168.61.228 v_mydb_node0003 | 192.168.61.229 (3 rows) [dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.227 "ls -lrt /home/dbadmin/data2.txt" -rw-r--r-- 1 dbadmin verticadba 184500036 Dec 11 22:01 /home/dbadmin/data2.txt [dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.228 "ls -lrt /home/dbadmin/data2.txt" -rw-r--r-- 1 dbadmin verticadba 184500036 Dec 11 22:03 /home/dbadmin/data2.txt [dbadmin@SE-Sandbox-43-node1 ~]$ ssh 192.168.61.229 "ls -lrt /home/dbadmin/data2.txt" -rw-r--r-- 1 dbadmin verticadba 184500036 Dec 11 22:03 /home/dbadmin/data2.txt [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "COPY big_fact2 FROM '/home/dbadmin/data2.txt' ON ANY NODE; -- This load will be apportioned" Timing is off. Rows Loaded ------------- 1500000 (1 row) [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "SELECT node_name, source_name, input_size FROM load_sources WHERE table_name = 'big_fact2';" Timing is off. node_name | source_name | input_size -----------------+-------------------------+------------ v_mydb_node0001 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0001 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0002 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0002 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0003 | /home/dbadmin/data2.txt | 30750006 v_mydb_node0003 | /home/dbadmin/data2.txt | 30750006 (6 rows)
Note that the original file size on each node is 184,500,036 bytes. Vertica split the load up into 6 pieces each 30,750,006 bytes in size.
[dbadmin@SE-Sandbox-43-node1 ~]$ vsql -ic "SELECT 184500036 / 30750006 apportioned_pieces;" Timing is off. apportioned_pieces ---------------------- 6.000000000000000000 (1 row)
Helpful Links:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/ExtendingVertica/UDx/UDL/ApportionedLoad.htm
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/LOAD_SOURCES.htm
Have fun!