COPY Files from NFS Mount
This is the use case I'm trying to solve: We have files coming from HDFS and we want to load them into a set of Vertica tables. I think we're going to opt for a solution that involves a flex table that we then parse into a set of persisted tables.
Here is my question: When we do the COPY command against the JSON files coming from Hadoop can I put the files on an NFS that is mounted to all the nodes in our Vertica cluster and do a COPY ON ANY NODE? Will that achieve parallelism across the entire cluster? What is the fastest method to load data as I've described above?
Cheers,
Eric
0
Comments
Hi Eric,
Yes. What you've described will work. Also, use DIRECT to bypass the WOS.
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AdministratorsGuide/BulkLoadCOPY/UsingParallelLoadStreams.htm
Regards
Gayatri
Thanks for your response. Is this a design pattern you've used before? Just wandering if there is anything else I should be considering. The reason I'm pushing for this solution is so I don't have to copy the files to all nodes (i.e. reduce network traffic).
Yes. A lot of our customers use this approach and have been sucessful.
Thanks
Gayatri