Slow performance for COPY LOCAL command, it uses only one machine in the cluster!
Hi,
I am copying data, from local 1000 files with total size about 1.7 GB, to a three-nodes Vertica cluster. I use this command:
COPY toy_edge from LOCAL '/home/dbadmin/toyGraph*' DELIMITER ' ';
I created the database on all nodes using :
admintools -t create_db -s ip1,ip2,ip3 -d graphDB
The process took long time. It is been almost 1 hour and the data is not loaded yet. Moreover, I noticed that only machine ip1 is busy (CPU=100%), the other two machines are idle.
Kindly note that the file exist on the machine ip1, however, I expected vertica to transfer some data to be stored to other machines as well.
Thanks,
-Khaled
0
Comments
Hi ,
To have parallel load you need to put your files on an NFS where each node can access or splite the files manulay to each node on the cluster .
Thanks
Can I use HDFS instead od NFS ? How?
Thanks,
-Khaled
Yes you can ,not sure about performence . see below
https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/HadoopIntegrationGuide/HDFSConnector/LoadingDataFromHDFS.htm
Thanks