HP Vertica Parallel Export - Is it recommended?
Hi,
Our goal is to export a massive table (on the order of TB bytes stored) to CSV(s), to transfer it elsewhere for archival. I've been searching the HP Vertica site, but have not found what I am looking for. Much of the documentation assumes that the user wants to export from 1 Vertica cluster to another Vertica cluster.
However, many non-HP sites reference the "parallel export" UDF here:
https://marketplace.saas.hpe.com/big-data/content/parallel-export
Is this a good option for what I'm looking to do? Or is the Parallel Export UDF deprecated, and thus not actively promoted on the HP site.
- David
0
Comments
Looks like the larger file (391.8 KB) on https://marketplace.saas.hpe.com/big-data/content/parallel-export does not build - due to multiple definitions of a function: Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)'
/tmp/ccptMaF0.o: In function `Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)':
/opt/vertica/sdk/include/Vertica.cpp:552: multiple definition of `Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)'
The smaller file looks to be the one to download, which does build successfully.
After trying out Parallel Export, I would have to say "yes, with caveats".
The upside: You get the fastest possible way to export a table, with great flexibility. You avoid the inter-node network traffic that slows down a general purpose vsql select to file.
The downsides:
1. Slows down other clients seeking Vertica connections. If you're exporting a huge table, break it up into smaller pieces, and export those tables, so it doesn't interfere with existing processes.
2. if you have a custom command ("cmd") which writes to custom files, like sed, gzip, or anything, be sure to use $$ to indicate PID (process ID). Otherwise, the processes will collide with each other and overwrite each other's files.
Sample Code:
Hi, can i use aggregate function on column while exporting.i need to do sum(column) and then export the output.Is it possible?
As far as I know that Parallel Export UDX is no longer available.
However, if you have it already, you should be able to export a sum if you use a sub query...
Example: