What is the fastest way to insert the data in Vertica?
In our company, we've created the development cluster of three nodes. I am using this cluster to test the insertion in the database. To insert the data in the vertica I am using the .csv files. The programming language is scala.
I want to know what is the fastest way to insert data in to the vertica database? In some cases our csv files contain millions of rows and vertica nodes get low on RAM. To make it faster and resource efficient I copy-paste and ingest the data on all three nodes of the vertica DB; I do it so that the load gets distributed on whole cluster. Also I am limiting the parallel ingestion to 2 ingestion at a time.
Please guide me so that I can improve the ingestion speed in the Vertica DB.
If you could show some chart (no of CSV lines to ingestion time) then it'd be a great help!!
Comments
Check out the following link to the online doc discussing "Load Parallelism":
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/ExtendingVertica/UDx/UDL/ParallelLoad.htm
From my experience, you overload each node as much as possible to a point. Vertica can't overcome any physical (i.e. memory, cpu, disk throughput) limitations of a node.
If you need faster load times, simply add more nodes!