What is the fastest way to insert the data in Vertica?

ashishtomer · January 2019

In our company, we've created the development cluster of three nodes. I am using this cluster to test the insertion in the database. To insert the data in the vertica I am using the .csv files. The programming language is scala.

I want to know what is the fastest way to insert data in to the vertica database? In some cases our csv files contain millions of rows and vertica nodes get low on RAM. To make it faster and resource efficient I copy-paste and ingest the data on all three nodes of the vertica DB; I do it so that the load gets distributed on whole cluster. Also I am limiting the parallel ingestion to 2 ingestion at a time.

Please guide me so that I can improve the ingestion speed in the Vertica DB.

If you could show some chart (no of CSV lines to ingestion time) then it'd be a great help!!

Jim_Knicely · February 2019

Check out the following link to the online doc discussing "Load Parallelism":

https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/ExtendingVertica/UDx/UDL/ParallelLoad.htm

From my experience, you overload each node as much as possible to a point. Vertica can't overcome any physical (i.e. memory, cpu, disk throughput) limitations of a node.

If you need faster load times, simply add more nodes!

We're Moving!

Create My New Community Account Now

What is the fastest way to insert the data in Vertica?

Comments

Leave a Comment