HP Vertica Parallel Export - Is it recommended?




Our goal is to export a massive table (on the order of TB bytes stored) to CSV(s), to transfer it elsewhere for archival.  I've been searching the HP Vertica site, but have not found what I am looking for.  Much of the documentation assumes that the user wants to export from 1 Vertica cluster to another Vertica cluster.


However, many non-HP sites reference the "parallel export" UDF here:




Is this a good option for what I'm looking to do?  Or is the Parallel Export UDF deprecated, and thus not actively promoted on the HP site.


- David






  • Options


    Looks like the larger file (391.8 KB) on https://marketplace.saas.hpe.com/big-data/content/parallel-export does not build - due to multiple definitions of a function: Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)'


    /tmp/ccptMaF0.o: In function `Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)':
    /opt/vertica/sdk/include/Vertica.cpp:552: multiple definition of `Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)'


    The smaller file looks to be the one to download, which does build successfully.



  • Options


    After trying out Parallel Export, I would have to say "yes, with caveats".


    The upside: You get the fastest possible way to export a table, with great flexibility.  You avoid the inter-node network traffic that slows down a general purpose vsql select to file.


    The downsides:


    1. Slows down other clients seeking Vertica connections.  If you're exporting a huge table, break it up into smaller pieces, and export those tables, so it doesn't interfere with existing processes.


    2. if you have a custom command ("cmd") which writes to custom files, like sed, gzip, or anything, be sure to use $$ to indicate PID (process ID).  Otherwise, the processes will collide with each other and overwrite each other's files.



  • Options

    Sample Code:


    SELECT ParallelExport("blah", "ha", "field", "field2"
    USING PARAMETERS separator='|', cmd='sed ''s/\\/\\\\/g'' | sed ''s/__REPLACE__/\\|/g'' | gzip -9 -c - > /some/dir/which/exists/on/all/vertica/machines/test.my_table_here.$$.gz')
    OVER (PARTITION BEST) FROM test.my_table_here;
  • Options

    Hi, can i use aggregate function on column while exporting.i need to do sum(column) and then export the output.Is it possible?

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited June 2018

    As far as I know that Parallel Export UDX is no longer available.

    However, if you have it already, you should be able to export a sum if you use a sub query...


    select ParallelExport(c1_sum)
     using parameters
     path='/tmp/test.csv.${nodeName}', separator=',')
    over (partition auto) from (select sum(c1) c1_sum from some_table);

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file