COPY command and disk IO >70%

Hi everyone!
I have a csv file that I want load to a vertica cluster (3 nodes).
The file contains 3 fields (integer, varchar(3), datetime), ~700Mb and 10 millions rows.
My table is segmented to all nodes by first field (integer and identity).
Then I try run COPY command (with "direct" parameter)
I get a very long execution of COPY (15 minute).
And I see that disk IO of cluster more than 70% ("bottle neck cluster" report by management console).
Is a high IO problem, or loading slows down due to another?
How I can resolve this issue?

Comments

  • edited March 2017

    Is there an estimate of the "typical" COPY load speed in Mb\s at one node?

  • I figured it out.

  • Can you share?

  • I use ssis to build ETL to Vertica.
    After adjusting the package's parameters (Buffer Size and so on), I got an increase load performance and reduced disk IO on the cluster to an acceptable 15%
    Although, the autocommit of the ADO provider's delivers a lot of pain (I would like to fill in all the data and only then make a commit).

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file