Regrading distributed session among nodes

hsaxena20 · October 2019

We have 3 node cluster under the load balancer. But when i ran below query it will give me most of the count on single node.
SELECT distinct user_name FROM dc_requests_issued WHERE
date(time) >= date(sysdate()) ;

Please suggest ASAP. What we are missing?

marcothesane · October 2019

Automatically, this is as good as it gets.
If you want to be a perfectionist, you would have to do that manually:
Connect to Vertica to check which node is least busy with sessions:
SELECT node_name,count(*) FROM sessions /* these are only the currently connected sessions */ GROUP BY 1 ORDER BY 2 LIMIT 1;
Collect the result , determine which IP address corresponds to the node name received, and connect to that node.
You will have a second-long connection to an unintended node (which might falsify the real connection balance a bit more), but you have a chance to balance the really active connections - those that keep firing queries ...

marcothesane · October 2019

Your questions seems misleading in many respects:

With "load balancer", do you mean Connection Load Balancer?
Where do you see, from the query you issue, a node name? Connection Load Balancer tries to make sure that the connections to Vertica are evenly distributed across nodes.

If you mean Connection Load Balancing the way I put it above, try:
SELECT node_name, COUNT(*) AS session_count FROM user_sessions WHERE session_start_timestamp >= CURRENT_DATE::DATE GROUP BY 1;
... to check all nodes that have been used as the connection node today....

hsaxena20 · October 2019

I mean to say number of queries are hitting to only one server. How i can manage this?

marcothesane · October 2019

how do you see that the queries only hit one server? which query? Did you run the query I wrote above? and which is its output?

marcothesane · October 2019

By "which query" , I mean , which query did you run to get the info that the queries only hit one server?

hsaxena20 · October 2019

By this query
SELECT distinct node_name,count(*) FROM dc_requests_issued WHERE
date(time) >= date(sysdate()) group by 1;

2006 - Node1
3823 - Node2
3441- Node3

hsaxena20 · October 2019

Result from below query
SELECT node_name, COUNT(*) AS session_count FROM user_sessions WHERE session_start_timestamp >= CURRENT_DATE::DATE GROUP BY 1;

896 - Node 3
1138 - Node 1
510 - Node 2

SruthiA · October 2019

What is the output of below queries?
SELECT currentclientbalancingpolicy FROM vs_global_settings;
SELECT LOAD_BALANCE_POLICY FROM V_CATALOG.DATABASES;

marcothesane · October 2019

It looks to quite some degree that we have a lot of clients that don't have ConnectionLoadBalancing=TRUE in the setup of their ODBC or JDBC connections. Or simply, vsql called without the "-C" switch to force connection load balancing.
This is my test:
$ for i in {1..3} ; do vsql -C -At -h 172.16.61.128 -u dbadmin -W passwd -d sbx_marco -c "select node_name from current_session"; done
v_sbx_marco_node0001
v_sbx_marco_node0002
v_sbx_marco_node0003

If you omit the "-C" switch, it will always be the first node ...

hsaxena20 · October 2019

Hey SruthiA,

Roundrobin is result of both queries.

marcothesane · October 2019

So dc_requests_issued (and with it, also the official query_requests system table, I'm sure) - returns this distribution:

2006 - Node1
3823 - Node2
3441- Node3

And user_sessions returns:

 896 - Node 3
1138 - Node 1
 510 - Node 2

Each query request issued originates from one session. So we had considerably more connections to node 1 , but they must have been idle (or sending long running, heavy queries) while nodes 2 and 3 got less connections. Node 2 issued the biggest number of queries from the lowest number of connections.

This is what your reports tell us. All 3 nodes are busy, albeit not in a perfectly balanced manner, but definitely no node is being left out.

What remains to clarify is where you now see a problem to which we should "suggest ASAP". Can you explain?

hsaxena20 · October 2019

As you said,
All 3 nodes are busy, albeit not in a perfectly balanced manner, but definitely no node is being left out.
We want to balance in perfect manner.

Sudhakar_B · November 2019

@hsaxena20,
What is your definition of "balance in perfect manner"? Exactly same number of connections on each node? That's meaningless!
Remember it is Load-Balancer and not connection-Balancer and from your metrics it seems load is perfectly balanced.
Thx

We're Moving!

Create My New Community Account Now

Regrading distributed session among nodes

Best Answer

Answers

Leave a Comment