Can someone help me with a python script to load a file into vertica table. Since it is a huge file, I need to load with 4 parallel threads. Please help.
Why not take advantage of Apportioned Load?
The "cleanest" way to make apportioned load happen is to have the load file on a directory that is locally mounted under the exact same directory on all existing Vertica nodes.
You transfer the uncompressed flat file to that directory, and it's immediately visible for all Vertica nodes with the same path name.
Uncompressed is necessary as, for apportioned load, each parsing thread of the, say, 8 parsing threads will position at the beginning and end of "their own" 8-th of the file, using fseek(), and then advance byte by byte until they find the next record delimiter, to determine their own portion.
With a compressed file, you can't do that.
I would recommend you to try using an Apportioned Load https://www.vertica.com/blog/faster-data-loads-with-apportioned-load-quick-tip/ , the best possible way to let python script to load. Hope you make any use of this friend
Thanks all for your answers! The main constraint I have is defining the parallel threads..
if I have less than 1 billion records, then I would like to load with 6 parallel threads .
if I have more than 1 billion records, then I would like to load with 8 parallel threads and the condition goes on .
Apologize for late reply! I was travelling.
Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.