Named pipe in Vertica

mayank_gupta · May 2013

If NAMED pipes are supported in COPY command then it will save lot of server space and will make Data loading fast

[Deleted User] · May 2013

Hi Mayank, that's a great idea. In fact, it's so great that we've implemented it :-) Named pipe support is available in all versions of Vertica that we currently support, and in some past versions as well. Just pass in the path to the named pipe as if it were a file. Vertica will detect that it's a named pipe and will proceed accordingly. However, named pipes are tricky to use correctly. We've seen many issues in the field that stemmed from scripts using named pipes where the processes connecting to the named pipes didn't exit and left the pipe open, where the processes didn't start and left Vertica hanging waiting for data, etc. So, if possible, we strongly recommend that you either pipe data into vsql and use vsql's "COPY ... FROM STDIN" feature, or that you let Vertica run and manage the remote program directly, via the open-source shell_load_package: https://github.com/vertica/Vertica-Extension-Packages/

mayank_gupta · May 2013

Actually this STDIN is different from NAMED PIPE files. In case of STDIN we have to pass data with in after COPY command: COPY TABLE FROM STDIN DELIMITIER '|'; >1|2 >2|3 >\. something like this... But in case of named pipe file it will be COPY TABLE from 'file' DELIMITER '|'; where file is NAMED PIPE file which was created using mkfifo file $] ls -ltr prw-r--r-- 1 root root 0 2013-03-25 12:06 file Regards, Mayank

[Deleted User] · May 2013

Hi Mayank, you are correct that these are different forms. What happens when you try to use a named pipe?

mayank_gupta · May 2013

Actually it is taking same time if we do in such a way , like 1--> transfer to flat file 2--> Then load. But in case of NAMED PIPE it should start at same time.

[Deleted User] · May 2013

So it sounds like what you're saying is, that we do support named pipes, but you're not getting the performance with them that you would expect? Vertica does start parsing data from named pipes as soon as data is available. It does not wait for the command on the other side of the pipe to complete. In fact, we have to start at the same time as the command. That's how named pipes work -- you can't just wait for them to finish; Linux won't allow it. In order for your command to be able to write any data to the pipe, Vertica has to be reading from the pipe continuously. Of course, if your command doesn't write any data to the pipe, then we can't read it :-) Make sure that your command is trying to write its data to the pipe immediately. Some commands don't write anything until they're finished executing; then there's nothing for Vertica to read. Also, if you're loading only a few rows (even only a few hundred rows), then it will take longer to start all the commands than it will take Vertica to load the rows. So the performance difference from using pipes will be too small to notice. Try loading more rows per COPY statement -- I usually start with a million rows at a time and tune from there.

Named pipe in Vertica

Comments

Leave a Comment