Balancing Vertica for resiliency

good day everyone,
we have a 3 node vertica 8 cluster, running on sles11 VMs.
It happens that during backup process one node stops (i will open a separate thread on this topic).

My task is to find a way to ensure the clients can connect to one of the running nodes when this happens. Native load balancing it's not the solution, so i'd like to try using an open source solution like HAproxy.
Has someone tried this? I've read that someone is using an hardware load balancer: could you please report some info about the configuration used? I.e. what is used to check the availability of a node (port 5433 listening?)
It would be very nice, thank you in advance
Alessandro

Comments

  • edited February 2017

    follow up: it seems it was very easy to configure HAproxy in tcp mode for load balancing.
    Now i "only" have to configure checks to let HAproxy detect when a Vertica host is down.

    The HAproxy config i've made is

    listen vertica-cluster
    bind 0.0.0.0:5433
    mode tcp
    balance roundrobin

    server vertica01 10.0.1.8:5433
    server vertica02 10.0.1.9:5433
    server vertica03 10.0.1.10:5433

  • edited February 2017

    this simple haproxy confg seems to fit the basic needs to balance the load and avoid sending conns to the nodes in down

    frontend vertica
    bind 0.0.0.0:5433
    mode tcp
    option tcplog
    default_backend vertica_cluster

    backend vertica_cluster
    mode tcp
    balance leastconn
    server vertica01 1.1.1.8:5433 check fall 1 rise 2
    server vertica02 1.1..9:5433 check fall 1 rise 2
    server vertica03 1.1.1.10:5433 check fall 1 rise 2

  • edited February 2017

    just add something like

    timeout client 8h <-adjust to your needs

    to avoid disconnections on client inactivity

  • If this is just checking whether port 5433 is accepting a connection, then also consider the scenario where a node is recovering. In this case the process is running, and listening port 5433, but if you try to connect you get a message about the node being in recovery. This is at least true in 7.2 - I haven't tested it in 8.0. So you still want the node to remain out of the load balancer until it's truly up. One way to do this is a health check of sending a "select 1" query to all nodes to verify that they are responding to queries.

    --Sharon

  • thank you for adding this info!
    in my scenario probably we don't need such an health check, so i will probably not spend time in trying it, but this can help someother someday.
    Alessandro

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file