COPY command and disk IO >70%

Dedyyshka · March 2017

Hi everyone!
I have a csv file that I want load to a vertica cluster (3 nodes).
The file contains 3 fields (integer, varchar(3), datetime), ~700Mb and 10 millions rows.
My table is segmented to all nodes by first field (integer and identity).
Then I try run COPY command (with "direct" parameter)
I get a very long execution of COPY (15 minute).
And I see that disk IO of cluster more than 70% ("bottle neck cluster" report by management console).
Is a high IO problem, or loading slows down due to another?
How I can resolve this issue?

Dedyyshka · March 2017

Is there an estimate of the "typical" COPY load speed in Mb\s at one node?

Dedyyshka · March 2017

I figured it out.

ckotsidimos · March 2017

Can you share?

Dedyyshka · March 2017

I use ssis to build ETL to Vertica.
After adjusting the package's parameters (Buffer Size and so on), I got an increase load performance and reduced disk IO on the cluster to an acceptable 15%
Although, the autocommit of the ADO provider's delivers a lot of pain (I would like to fill in all the data and only then make a commit).

We're Moving!

Create My New Community Account Now

COPY command and disk IO >70%

Comments

Leave a Comment