Load Balancer

Client connections made through the Virtual IP (VIP) are managed by a primary (master) director node, which is one of the real server nodes (RIP).” Does it mean, I need to request one more I.P. for day first node and it can then serve as load balancer as well?

Comments

  • Hi Abhishek, Actually, Vertica itself has no notion of a "master" node. I assume you are referring to Vertica's IPVS load-balancer service? IPVS is a separate service from the core Vertica server. It needs two computers (if you want the full master/slave setup to help with failover), and it expects those to be separate physical machines; not nodes in the Vertica cluster at all. One of those machines will become the "master", and one will become the "slave." (The reason for having dedicated hardware is that IPVS itself imposes some load on the system; if you're giving one machine a bunch of extra load by having it handle all incoming connections, that partially defeats the purpose of running a load balancer in the first place. The dedicated IPVS machines can be pretty cheap; all they need are really-fast network cards.) It may be possible to run IPVS on the same physical machines as your cluster. In that case, yes, you would need an additional IP address. (Or you would have to play interesting games with port numbers.) But if you're asking this question, I should ask you, why are you trying to run IPVS? Vertica does not typically need a load balancer. Vertica allows you to connect to any node in the cluster. (And regardless of which node you connect to, a typical query will already be automatically distributed across all nodes.) So in many cases you don't need a load balancer to get good performance. There are some use cases where a load balancer is very important. But our strong recommendation is to not set up a load balancer unless you already have a running cluster and you know that you need one. Adam
  • Thanks Adam !! My vertica cluster is of 3 nodes presently. So I understand that, its good not to bring IPVS in picture. IPVS is best if we have a larger cluster with lots of concurrent connections. Will IPVS be useful if I wish to give out a single IP address or hostname for people to connect to, rather than giving them a list to choose from? Abhi
  • Hi Abhi, Glad to help! For IPVS on that sort of three-node cluster, it's possible that it could help, but my experience has been that it's not always necessary, and that it does always introduce a bunch of complexity and confusion. If you want to give out just one hostname, here's a safe strategy: You could request an extra hostname, say "vertica-cluster", and give that name out, and arbitrarily point it at one node in your cluster. You might find that you get good enough performance this way. In which case you've saved a bunch of effort (and a server) that you would've needed for load balancing. Or you might find, based on your workload, that that node is getting overloaded and that you need a little more performance. Then you can install IPVS on an extra computer, test it out, and when it's ready, update the "vertica-cluster" hostname to point to the IPVS server. Then everyone becomes load-balanced automatically. Adam
  • Thanks Adam ! Since I was getting an error : Severity Timestamp Node Thread Message Code Message ERROR 18/07/2013 16:01:58 node01_eitinfapv103 READER_1_1_1 RR_4036 Error connecting to database [ 523 92[Vertica][VerticaDSII] (160) Connection attempt failed: FATAL 4060: New session rejected due to limit, already 50 sessions active." I decided to go for load balancer which will distribute the load on all nodes in my cluster. So I have got one virtual IP for Load balancer It will be really helpful if you can let me know the steps how to connect to vertica db through virtual IP. Let me know if I am going on the right path. Thanks in advance
  • Hi, The instructions for installing IPVS are in the Administrator's Guide: https://my.vertica.com/docs/6.1.x/PDF/HP_Vertica_6.1.x_AdminGuide.pdf Look for the "Load Balancing" section under "Managing the Database". (It's about 12 pages of instructions; I don't think I can usefully reproduce them in this little comments box :-) ) Adam
  • Also, regarding multiple connections -- maybe it's the right thing to do; maybe it's not. What does system-resource utilization look like on the nodes? Is it evenly distributed? An alternative would be to increase the connection timeout on the resource pool. Again in the Administrator's Guide, see the "Best Practices for Managing Workload Resources" section. Or just increase the connections limit on the initiator. Again, I would use load balancing if and only if the actual load (as measured by tools like "top", "iotop", "iftop", etc) is uneven. Which it may well be, but you didn't say :-) If you're hitting the connections limit, that can be a hint to check system-resource usage. But it's not automatically a reason to use IPVS. You say you're hitting the 50-connections limit; do you have 50 CPU cores to run all those queries? If you are max'ing out Vertica's connection pool and getting these errors, that might well mean that you want more nodes to help with the work of setting up new connections. In which case you should install IPVS. But it often means that Vertica needs more compute-power to handle your workload, and is bogging down. In which case the problem isn't concurrency, and adding more queries on the cluster will just make things worse: Vertica may run out of RAM and have to do things like spilling JOINs to disk; it will spend more time context-switching between all those threads and queries; etc. There's a reason Vertica's connection limit defaults to 50 -- that's about the point where the answer to these questions is "it depends"; where you have to start thinking about what's actually the bottleneck in order to figure out what your real problem is.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file