Full table export performance degradation over time
Hello,
I have a background job that exports a full table on vertica given a set of user supplied filters. I am noticing that as the export continues further and further down the table, the query times get longer and longer.
I am paginating the result set with LIMIT and OFFSET, doing some processing, then writing to a file. Does anyone have any suggestions as to why this is happening and what I can do to fix it?
Thanks.
0
Comments
Hi,
There is not really dateails on how you export the file, are you doing vsq -c "select... "> file.txt ? or are you using a tool to do that?
If you are doing a select * from ... to export all the data is moved to the initator and this node has to write it and can be a lot of traffic and that initiator at the end is your bottle neck. There is free UDx in the market place that helps to export data and what it does is that each node export the data that it has and it is much more efficient. This is the link to the function if you are intrested
https://marketplace.saas.hpe.com/big-data/content/parallel-export
if I did not understand the problem, let me know,
Eugenia
Basically we were manually paginating between the results which what was determined to be the issue. We have switched to using the vsql > export.txt approach since then which is MUCH better.
I will look at this parallel export though, this could speed things up even further!
Thank you
Hi Eugenia,
This comment looks a lot like my posting here:
https://community.dev.hpe.com/t5/Vertica-Forum/HP-Vertica-Parallel-Export-Is-it-recommended/m-p/238075/highlight/true#M13945
If this is the fastest way to safely export large tables, then I will try this out. However, if this is deprecated, and there is a better option, I'd love to know.
- David