How to try to instll the ipvs rpm and stay alive ( or it fails with dependencies)

When trying to install the ipvs loadbalancer rpm I get these error messages: rpm -Uvh VerticaIPVSLoadBalancer-6.1-0.RHEL5.x86_64.rpm error: Failed dependencies: is needed by VerticaIPVSLoadBalancer-6.1-0.RHEL5.x86_64 is needed by VerticaIPVSLoadBalancer-6.1-0.RHEL5.x86_64 is needed by VerticaIPVSLoadBalancer-6.1-0.RHEL5.x86_64


  • linux 6.3 : ( 64bit) phase one install the redhat v5 rpm for ipvs then install the rpm for v6 . fails on dependencies . On the servers a community rever is running .
  • rever=server
  • Hi yamy, I'm not quite clear on what you're doing -- you're trying to install the RHEL 5 RPM on RHEL 6? What happens if you just install the RHEL 6 RPM on RHEL 6, and skip over the RHEL 5 RPM? Adam
  • Hi Adam, As usual I forgot to state that first :-) ok , I have installed a three node vertica dbase ; I started to try to figure out how the whole master/slave business work and stumbled upon the following url in the installation guide were it explains about the ipvs and it states clearly that for the linux v6 you have to install/upgrade the rpm for rhel5 first and only after that the rpm for linux 6 Thanks,
  • Also I think I tried to install the rpm directly which failed too .
  • and finally the db was up , maybe problematic too ....( oops) ?
  • Hi yamy, Hm... I believe that is a typo in the guide. Thanks for pointing it out. You should not need to install the RHEL 5 RPM first; if you're on RHEL 6, just go straight to the RHEL 6 RPM. Could you try again to install the RHEL 6 RPM, and report exactly what's not working for you? For what it's worth: It's not necessary to use IPVS; I think relatively few people do. You can simply connect directly to any node in the database. (Vertica is not a traditional master/slave database -- any node can be the "master", or the initiator, for any query; you can have different people connected to multiple nodes at the same time, etc.) You only need IPVS if you want to give out a single IP address or hostname for people to connect to, rather than giving them a list to choose from; and if you don't want to just pick one node and give out that IP address because you're seeing too much load on that node from having to handle all the connections. That said, if you do want IPVS, it should certainly work for you :-) Adam
  • ok, good we made two issues clear :master/slave is not relevant I understand;you connect to a specific node and that is the one you can query and update , right ? the other issue is the vip ; once you create that you should be able to see some LB , right as it has become one ip instead of "n" . on that vip we can put a cname to make it entirely virtual . I will try to run it again . Thanks,
  • For the one node, yeah, that's at least mostly right -- Whichever node you connect to will initiate the query. If the query asks for data that is not stored locally on that computer, then that node will automatically connect to the rest of the cluster and run a distributed query across (potentially) the whole cluster. In that sense, it sort of is like a master/slave setup -- you may be connecting to one node initially, but you're using the whole cluster. The difference is, with a master/slave setup, there's one computer that's the "master"; you always have to connect to that one computer. With Vertica, it doesn't matter which computer you connect to; you can initially connect to any machine / IP, it doesn't matter which, and that computer will figure out what to do to run your query on the shared cluster. Vertica's approach has several advantages. The biggest advantage over a traditional master/slave setup is, what if the master node goes down? Then you can't run any queries. But what if a node in a Vertica cluster goes down? No problem; just pick a different node to connect to, and keep going. IPVS, though, has the same problem: What if the machine running IPVS goes down? IPVS has the same issue; if the one machine is down, the whole cluster doesn't work. IPVS really is something of a special-case tool. I would definitely recommend that you not use it initially -- just have people connect directly to one or more nodes. Most users find that this is sufficient to make good use of the whole cluster. IPVS is really best for if you have a larger cluster with lots of concurrent connections and you don't have a dedicated hardware load-balancer; then you can install IPVS on some spare machine and have it act like a dedicated load-balancer.
  • ok , thanks . BTW are there commands to see what is running on which node? I come from the oracle RAC world so maybe i'm trying to draw paralellel lines that do not exist. In any case following your suggestion I will wait with the ipvs and check the physical LB option . Thanks you Adam.
  • (To clarify: "not use it initially" -- It might be that you do want it for your use case. But it's a lot of complexity and has some downsides. So try first without it, and only install it if you really need it. If you want to plan for it, maybe you could make a CNAME for it but initially just point it at one of the nodes in the cluster. Then update it to point at IPVS later if you need it.)
  • Hm...: If you do select session_id, node_name, current_statement from sessions;, that table will tell you the node that each current session is connected to, and the statement that that session is currently executing (if any). If you do select distinct node_name, session_id from execution_engine_profiles;, that will tell you what nodes are actually executing each query. (If you omit the "distinct" and add some more columns, this table will tell you, probably in more detail than you really want, exactly what each node is doing.) Throw in some joins, and hopefully you can get what you're looking for.
  • Incidentally, if you'd like to monitor this sort of thing, you might be interested in our new Management Console. We just released (within the past couple days) Vertica 6.1 SP2, for both enterprise and community users; we also have an updated management-console package that works with the new version and which has many graphs that summarize the current state of the cluster for you. (We had a management console previously, but it was much heavier-weight and required a dedicated server.)
  • Hello, My name is Gil Peretz, and I am responsible for Vertica Platform development for SaaS @HP. Regarding the load balancer (LB)... Our architecture has three layers - Loaders into Vertica, ETL's processes, and Queries from External Applications. we supposed to be able to sustain 600-1000 tenants when each tenant will have two sachems (staging and target). Each Tenant can have between 50-200 entities (Dimensions and Facts) loaders and ETL's are running on the tenant staging schema and finally MERGE the data into the Target. Having say that, when the whole "ceremony" is playing together, there can be hundreds sessions running concurrently. It seems that LB can help us to control the amount of connections/sessions into Vertica. Assigning each of our application server (there can be 4-5 of them in the first place) connections, to different Vertica cluster machine in a "round robin" method is not good practice. This is why i am struggling using IPVS LB. BTW, I had no issues with installing the IPVS, (using Vertica 6.1.1), but I haven't activated it nor using it YET. Regards -------------------------------- Gil Peretz , 054-5597107
  • Hi Gil, It's great to hear from you! And to hear about a real Vertica deployment. Yeah, SaaS is one of the applications that really can benefit from something like Vertica IPVS. IPVS is actually a generic Linux application, for load-balancing arbitrary TCP connections: (It's a part of the Linux kernel in our supported distributions.) The Vertica IPVS package just enables it, configures it, and teaches it how to tell which nodes are in a Vertica cluster (and are up). There's a bunch of documentation for IPVS online, and it supports various load-balancing schemes other than round-robin. So I suspect that you can configure it to do whatever you'd like. Probably one of the hashing schemes would better meet your needs? Such a configuration change would not be supported by Vertica tech support -- we just support the canned configuration. I've never tried it myself. But it would would likely be supported by the Linux IPVS folks; you could get in touch with them. Also, Vertica sysadmins operating something multi-tenant at that scale sometimes solve this problem with a dedicated load-balancing appliance. I have not worked with these myself. But I would imagine that they could give better performance with a large number of connections, and that they are a more widely understood technology than IPVS. Just another option for you to consider, based on your needs. For an initial database, though, such as yamy's three-node cluster, from experience I do think it's best to start off simple and to add complexity only if/as necessary. Adam
  • Hello Adam, many thanks for your answers.
  • Adam

    Additional point to consider LB , is the fact that the initiator node in many case using  more CPU cycles that the other nodes , mainly for aggregation query’s,  to compute the grant  totals from all nodes . 

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file