most efficient way to insert huge amount of data into vertica

edited March 2019 in General Discussion


I'm looking for how I can load into my vertica db 20B events per day.
For now the fastest way I figured out is to use copy with custom udx source, default delimited extractor and pinned projection on target global temp table (in terms of speed and thread count).

And accidenlty I figured out that there is no query running on less then 4 threads. And even select 1 results in 4 threads:

explain verbose select 1;

Estimated resources for plan:

Scratch Memory MB: 0
File Handles: 0
Worker Threads: 4
Blocking Threads: 0
Externalizing Ops: 0
Unbounded Mem Ops: 0
Max Threads: 56

1) What are they used for? Can I reduce it to 1? I have already set executionparallelism to 1
2) Is there any way to avoid resegmentation on copy other then guessing resulting segment on source or using pinned projection. or 220 threads = 4 (initiator threads) + 4*18 (initiator blocking threads) + 8 * 18 (executor threads) on single copy is the best I can get? It can be 10 times less if I somehow avoid resegmentation on copy.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file