Vertica cluster with unbalanced work load cross nodes

John_Y · May 2017

We have a vertica cluster with 5 nodes in our company, but one of the nodes (node 5), always has significant higher work load than other nodes, high IO, memory, #number of running queries, and number of threads are all much higher.
Please refer to the following figure, can someone help to explain why?

Thank you

emoreno · May 2017

For what I see in the graph you are also running more queries in NODE05 compared to the others. Do you have a load balancer?

With a query like this you can verify how many queries you run in each of the nodes.
select node_name, date_trunc('day',query_start::timestamp) day_query, count(*) from query_profiles group by 1,2 order by 2 desc,1 limit 100;

Check if you do have more in node5 and if so try to rebalance the workload between the nodes.

RaghavA · May 2017

Hi John,

Emoreno's suggestion is a good way to start investigating. Just to modify it, please add "query_type" too in above SQL to see which type of query is not evenly load balances by your application/load_balancer.

In my experience, it is seen that certain type of load creates like "LOAD" events are evenly distributed but 1 application is only firing SELECT DML's to 1 node only which causes this imbalance.

Try using this SQL
select node_name, date_trunc('day',query_start::timestamp) day_query,query_type,count(*) from query_profiles group by 1,2,3 order by 2 desc,1 limit 100;

it has to be some Application issue which is connecting to database.

Regards,
Raghav Agrawal.

John_Y · May 2017

thank you very much for your help, those queries do help!

RaghavA · June 2017

Hi John,

Glad to hear that.

ckotsidimos · June 2017

Is your issue solved?? Have a similar issue, but concluded that a projection change is needed.

John_Y · June 2017

we were able to find some queries which hits directly to one of our nodes, and redirect them to the load balance, but we are still having the problem of high IO usage., the IO is almost 100% all the time, although CPU and memory are reasonable. Overall queries speed is super slow. Please take a look at the graph below

Can anybody suggest us some ways to determine the reason of problem?
Thank you.

TomM · June 2017

Have you heard of the script vioperf? I wonder if that would help you diagnose what's going on. https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/InstallationGuide/scripts/vioperf.htm?Highlight=vioperf

John_Y · June 2017

Thank you, I just tested and here's the result, it looks like the throughput is really low? any way to dive deep into it?

ckotsidimos · June 2017

@John_Y I would suggest, profile the queries the db has. Run a designer to get to see what projections you are missing and then Check the tuple mover operations

emoreno · June 2017

@John_Y , if you run vioperf while the database is running the numbers are going to be slow as vioperf simulates work load + the workload of the database. I recommend you to run it with the database down or not workload and/or involve your sysadmin or hardware people to tell you if the IO is slow.

You can monitor the active workload with SAR tools or iostat but there is nothing that you can do in Vertica, if the disk are slow you need to check with infrastructure as seems that you are ioboud.

Does make sense?

emoreno · June 2017

@ckotsidimos projection design for a high IO is not the right solution. There is way that you can reduce the storage with some type of encoding so you have less to read but those will pay in CPU cycles. You need to meet the Vertica recommendations in IO, if you don't meet that you will be IO bound. There is some info in the doc : https://my.vertica.com/kb/Redesigning-Projections-for-Query-Optimization/Content/BestPractices/Redesigning-Projections-for-Query-Optimization.htm

I am wondering why is IO, it is right or writes? On writes you can improve the performance if you reduce the number of projections for the same table but for read you may affect the query performance if you change the configuration just to reduce the storage foot print.

Hope that make sense.

Vertica cluster with unbalanced work load cross nodes

Comments

Leave a Comment