Is there a better way to export data from a very big vertica table to flatfile?

sri_vtca · December 2020

I'm trying to export data from vertica table(of size 50Gb, no.of rows 500mln, no. of fields 82) to a flatfile(pipe delimited) using vsql without column headers, it has been running for more than 3 hours now and exported 200mln out of 500mln.
We've even more bigger tables( eg: of size >1Tb , >3bln rows, #of fields >50) to unload data from vertica table to flatfile. Wondering if there is a better way to export such huge data from a table to flatfile? Please help
Code Snippet i'm using:
vsql -At -h VERT_SERVER -U UNAME -w PWD -d DB <&1
\set ON_ERROR_STOP ON
\timing
\set
\o /file_path/file_name.dat

SELECT C1,C2,C3,C4,C5,,,,,,C82 from table_500mln;

\q
vsqlend

SruthiA · December 2020

@sri_vtca Does parquet format work for you? If so, you can try to use Export to parquet and it works faster compared to the method you are trying now.
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/EXPORTTOPARQUET.htm
https://www.vertica.com/blog/whats-new-vertica-8-1-1-introducing-export-parquet-format/

poojan · December 2020

I second Sruthi's answer, export to parquet will be much faster and parquet file size will also be way smaller than flat-file representation of the same data. It would as easy as running:
vsql -At -h VERT_SERVER -U UNAME -w PWD -d DB -c "export to parquet(directory='/shared-fs/table_500mln') as select * from table_500mln"
You can also partition your data monthly by partitioning over month_col column using:
vsql -At -h VERT_SERVER -U UNAME -w PWD -d DB -c "export to parquet(directory='/shared-fs/table_500mln') over (partition by month_col) as select * from table_500mln" This way managing exported data would become much easier in long run as you can run export monthly.

We're Moving!

Create My New Community Account Now

Is there a better way to export data from a very big vertica table to flatfile?

Answers

Leave a Comment