Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Differences in load times to vertica DB.How to improve the speed during the writing of data?

Hi all,

 

I noticed that the load times between 1 node cluster vs 3 nodes cluster are completely different. In the fact if I load the data (ca. 30 Mio. rows) to the 1 node cluster it is circa 6 time faster as it is done, when I try to load the same data to 3 node cluster. I know that 3 nodes cluster needs some time because of replication of data, buddy projects etc. but the time difference is to big actually(mostly I need ca. 9 hours and above for load the data into the 3 nodes cluster and just 1,5 h into the 1 node cluster).

 

I need to find some solution to improve the loads into the 3 nodes cluster, do you now any steps what can be done to make the load faster?

 

Hereby my infrastructure:

1 node cluster (CentOs 6.7, 8 vCPU, 32 GB RAM)

3 nodes cluster (each node with: CentOS 6.7, 8 vCPU, 32 GB RAM, the nodes see each other and comunicate each other without any problem)

 

Thanks you for any tips and help.

 

Best,

Konrad

Comments

  • Is that 30 million rows?  If yes, something is very wrong.  How are you loading the data - with COPY or with individual INSERT statements?  Any sequences or similar involved - with small cache values?

     

      --Sharon

     

  • yes it is "just" 30 millions rows, that I'm writing via "vertica bulk loader" from pentaho/kettle.

    It is set up to write the rows via COPY and the data should be saved directly in ROS.

     

    I think this is something with the 3 nodes cluster settings, but I don't get really what.

    The whole cluster is setup in the cloud (this is already some kind of cluster) and based on three virtual maschines. So the infrastructure isn't physical in this case.

     

  • You could start by querying the LOAD_STREAMS table while the load is in progress to see where the time is being spent - whether it's spending more time parsing or sorting.  The EXECUTION_ENGINE_PROFILES table provides more detailed insight into where the time is being spent, though that's more "advanced".

     

    If you are questioning the health of the environment, also run /opt/vertica/bin/vnetperf, vioperf, and vcpuperf to make sure that the hosts meet the recommended criteria.

     

      --Sharon

     

  • Hi Sharon, thanks for the tip with this vertica checker.

     

    I've run them through and everything looks OK, moreover I've run the load from some another server within the same job and the load times problems don't occur within this connection.

     

    I think that my pentaho instalation is broken in some way, so need to investigate this side of infrastructure.

     

    Thank you very much

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.