Vertica can divide the work of loading data, taking advantage of parallelism to speed up the operation. One supported type of parallelism is called apportioned load.

An apportioned load divides a single large file or other single source into segments (portions), which are assigned to several nodes to be loaded in parallel.


I want to load a data file that contains 100,000,000 records.

dbadmin=> \! wc -l /home/dbadmin/big_data.txt
100000000 /home/dbadmin/big_data.txt

For my first load attempt, I’ll load the file from a single node in my 3 node cluster.

dbadmin=> \timing
Timing is on.

dbadmin=> COPY big_data FROM '/home/dbadmin/big_data.txt' DIRECT;
Rows Loaded
(1 row)

Time: First fetch (1 row): 49078.222 ms. All rows formatted: 49078.268 ms

Next I will re-run the load, but this time include the “ON ANY NODE” option of the COPY command so that Vertica performs an apportioned load.

dbadmin=> COPY big_data FROM '/home/dbadmin/big_data.txt' ON ANY NODE DIRECT;
Rows Loaded
(1 row)

Time: First fetch (1 row): 21141.006 ms. All rows formatted: 21141.045 ms

Wow! An apportioned load executed over twice as fast as a single node load!

dbadmin=> SELECT 100 - (21141.006 / 49078.222 * 100) || '%' PCT_FASTER;
(1 row)

Helpful link:

