Balancing Vertica for resiliency

atinivelli · February 2017

good day everyone,
we have a 3 node vertica 8 cluster, running on sles11 VMs.
It happens that during backup process one node stops (i will open a separate thread on this topic).

My task is to find a way to ensure the clients can connect to one of the running nodes when this happens. Native load balancing it's not the solution, so i'd like to try using an open source solution like HAproxy.
Has someone tried this? I've read that someone is using an hardware load balancer: could you please report some info about the configuration used? I.e. what is used to check the availability of a node (port 5433 listening?)
It would be very nice, thank you in advance
Alessandro

atinivelli · February 2017

follow up: it seems it was very easy to configure HAproxy in tcp mode for load balancing.
Now i "only" have to configure checks to let HAproxy detect when a Vertica host is down.

The HAproxy config i've made is

listen vertica-cluster
bind 0.0.0.0:5433
mode tcp
balance roundrobin

server vertica01 10.0.1.8:5433
server vertica02 10.0.1.9:5433
server vertica03 10.0.1.10:5433

atinivelli · February 2017

this simple haproxy confg seems to fit the basic needs to balance the load and avoid sending conns to the nodes in down

frontend vertica
bind 0.0.0.0:5433
mode tcp
option tcplog
default_backend vertica_cluster

backend vertica_cluster
mode tcp
balance leastconn
server vertica01 1.1.1.8:5433 check fall 1 rise 2
server vertica02 1.1..9:5433 check fall 1 rise 2
server vertica03 1.1.1.10:5433 check fall 1 rise 2

atinivelli · February 2017

just add something like

timeout client 8h <-adjust to your needs

to avoid disconnections on client inactivity

Sharon_Cutter · February 2017

If this is just checking whether port 5433 is accepting a connection, then also consider the scenario where a node is recovering. In this case the process is running, and listening port 5433, but if you try to connect you get a message about the node being in recovery. This is at least true in 7.2 - I haven't tested it in 8.0. So you still want the node to remain out of the load balancer until it's truly up. One way to do this is a health check of sending a "select 1" query to all nodes to verify that they are responding to queries.

--Sharon

atinivelli · February 2017

thank you for adding this info!
in my scenario probably we don't need such an health check, so i will probably not spend time in trying it, but this can help someother someday.
Alessandro

We're Moving!

Create My New Community Account Now

Balancing Vertica for resiliency

Comments

Leave a Comment