Tuning multiple copy commands in parallel
We have a single node setup of Vertica on which we execute multiple copy load commands in parallel. We have set up a resource pool with a planned concurrency of 15 and we have been firing 15 copy commands in parallel. A single copy command executes fast but with increased numbers of copy commands in parallel, the load speed has become slow. Is there any recommendation to tune default settings that can help in faster execution of the copy command in parallel?
How many parallel copy commands(with few million rows) is recommended?
Our use case is very simple
The service keeps loading events to a CSV file and then periodically we roll the file and fire the copy command providing the CSV. We can have multiple services running and that usually fire copy commands at the same time to the Vertica.
Is there any faster approach to load data to Vertica when we have a continuous stream of events coming?
One of the columns is varchar(65000) - any tuning recommendation?
Answers
What's the vertica version? Can you check EnableApportionLoad is enabled for the Parallel Load Streams.
In addition check the recommendations mentioned here:
https://forum.vertica.com/discussion/comment/245482#Comment_245482
Vertica 9.3. Does it matter for a single node setup too?
Can I use the batch insert from the continuous stream of incoming data, will it be equally performant like copy?
I know it implicitly uses copy command only but not sure about the performance.
In one node cluster parallel loads will depends on number of core (Host processors), memory and diskspace. What's the maxconcurrency set to? Maxconcurrency will decide the maximum number of concurrent queries that can run against the pool. Planconcurrency is used for the estimate of the number of concurrent queries that may run against the pool. This parameter is used to calculate the query budget for a resource pool.
Plannedconcurrency - 12, Maxconcurrency - 24