We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


HP Vertica Parallel Export - Is it recommended? — Vertica Forum

HP Vertica Parallel Export - Is it recommended?

 

Hi,

 

Our goal is to export a massive table (on the order of TB bytes stored) to CSV(s), to transfer it elsewhere for archival.  I've been searching the HP Vertica site, but have not found what I am looking for.  Much of the documentation assumes that the user wants to export from 1 Vertica cluster to another Vertica cluster.

 

However, many non-HP sites reference the "parallel export" UDF here:

 

https://marketplace.saas.hpe.com/big-data/content/parallel-export

 

Is this a good option for what I'm looking to do?  Or is the Parallel Export UDF deprecated, and thus not actively promoted on the HP site.

 

- David

 

 

 

 

Comments

  •  

    Looks like the larger file (391.8 KB) on https://marketplace.saas.hpe.com/big-data/content/parallel-export does not build - due to multiple definitions of a function: Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)'

     

    /tmp/ccptMaF0.o: In function `Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)':
    /opt/vertica/sdk/include/Vertica.cpp:552: multiple definition of `Basics::BigInt::binaryToDecScale(void const*, int, char*, int, int)'

     

    The smaller file looks to be the one to download, which does build successfully.

     

     

  •  

    After trying out Parallel Export, I would have to say "yes, with caveats".

     

    The upside: You get the fastest possible way to export a table, with great flexibility.  You avoid the inter-node network traffic that slows down a general purpose vsql select to file.

     

    The downsides:

     

    1. Slows down other clients seeking Vertica connections.  If you're exporting a huge table, break it up into smaller pieces, and export those tables, so it doesn't interfere with existing processes.

     

    2. if you have a custom command ("cmd") which writes to custom files, like sed, gzip, or anything, be sure to use $$ to indicate PID (process ID).  Otherwise, the processes will collide with each other and overwrite each other's files.

     

     

  • Sample Code:

     

    \timing
    SELECT ParallelExport("blah", "ha", "field", "field2"
    USING PARAMETERS separator='|', cmd='sed ''s/\\/\\\\/g'' | sed ''s/__REPLACE__/\\|/g'' | gzip -9 -c - > /some/dir/which/exists/on/all/vertica/machines/test.my_table_here.$$.gz')
    OVER (PARTITION BEST) FROM test.my_table_here;
  • Hi, can i use aggregate function on column while exporting.i need to do sum(column) and then export the output.Is it possible?

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited June 2018

    As far as I know that Parallel Export UDX is no longer available.

    However, if you have it already, you should be able to export a sum if you use a sub query...

    Example:

    select ParallelExport(c1_sum)
     using parameters
     path='/tmp/test.csv.${nodeName}', separator=',')
    over (partition auto) from (select sum(c1) c1_sum from some_table);
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file