Is COPY FROM LOCAL STDIN multithreaded?
We are using COPY FROM LOCAL in Vertica 7 to load our data by streaming it from HDFS. Our concern is that, the load speed would be limited by STDIN. We have 2 questions as below:
1. Is Vertica COPY FROM LOCAL multithreaded?
2. If so, is the overall speed limited by STDIN. E.g. in extreme case, if we stream 1 record per min then even a multithreaded COPY would be of little use.
We tried to run the COPY command by giving the web HDFS source directly but that doesn't allow us to write exceptions and rejected data rows to our local filesystem.
0
Comments
I am doing a bunch of testing on importing right now. I think that there is a single thread on the ingest part of COPY .. FROM STDIN until the commit happens. This makes sense, unless you were starting an app which wrote to stdout from multiple threads and piped it to vsql?
When you do a COPY SOURCE HDFS() you should end up with results in /vertica/data/[dbname]/[catalog]/CopyErrorLogs/ that shows the exceptions and the rejected data.