Cluster fail safe?

Another noob question. I have a three node cluster running on three machines with last 4 digits of ips being 233,161,171.
Here is the deal.
I am feeding data to the machine with ip 233 via eclipse and yes the data gets replicated across nodes. Today i pulled out the lan cable of the 233 machine, eclipse popped an error and the data transfer stopped. I thought the data would go to the other two nodes but that didnt happen.
Why is this? Is this related to the K safety factor? the MC often flashes a notification saying the cluster is not ksafe=1.

Please advice


  • Hi Ammol,

    Vertica does data replication; it doesn't do live failover.  If your client is connected to the 233 machine, that means it literally has a TCP connection to the 233 machine.  When you pulled out the Ethernet cable, you disconnected Eclipse's connection.  If Eclipse loses its connection, that's an error condition; it will error out.

    Vertica 7 (drivers+server) can be configured to do failover -- if you set up driver load-balancing, and the node you are connected to goes down, then you can have your code try to re-connect; the driver will notice that the original node is down and pick a different node.  But you will still have to re-start the transaction that was running; that transaction received an error so it rolled back on the server.

    Also -- if you're getting the "cluster is not ksafe=1" warning, that may mean your cluster is not doing data replication at all.  Vertica doesn't replicate your data to all nodes.  (This would defeat the purpose -- if each node had a copy of all the data, then you could never store a really large amount of data in a big Vertica cluster; you could never exceed what would fit on the node.)  It instead segments the data.  With a K-safety of 0 (kind of the simplest case but not the default), each row goes to exactly one node.  So if one node goes down, then you can't access any of the data on it; this isn't allowed in a transactional system (all nodes must always have a 100%-consistent view of committed data), so the whole cluster may go down.  Certainly if you're loading data, we can't keep loading data because there's nowhere to put the rows that are intended for that node.

    With a K-safety of 1, your cluster can survive 1 node failure because every row is copied to two nodes; if one goes down, there's a backup.  (Recent versions of Vertica can actually survive many more nodes going down, as long as you don't lose both a node and its buddy.  See the documentation for more details.)

    With a K-safety of 2, theoretically you can survive 2 nodes going down, because we make three copies of the data.  Though since more-recent Vertica versions can survive many more nodes going down even with K-safe=1, very few customers feel that it's worth paying for the extra storage.  Also, Vertica has another rule -- more than half of the cluster must be up at all times.  Otherwise, you might have had a cluster that was split across two racks and maybe the link between the two racks got disconnected:  Technically both halves might have a complete view of the data, but again, we must have a consistent view of the data at all times; we can't do that if half our cluster can't sync with the other half.  "half+1" of a 3-node cluster is just 2 nodes.  In fact, you must have at least a 5-node cluster for "K-safe=2" to provide any high-availability advantages.  With smaller clusters than that, it's useful only for testing.

  • Hi Anmol,
    When you say you pulled the lan cable on 233, do you mean you disconnected it from your data load program running in Eclipse, or you pulled the node out of the cluster while keeping it connected with the data loader program?
    In either case, you can provide the remaining nodes as BackupServerNode in your connection properties, and the JDBC client driver should then fail over to the remaining nodes and continue to load data, You will need Vertica 7.x for this to work, and your code should look like:

    connProperties.put("BackupServerNode", "x.x.x.161,x.x.x.171");

    Hope that helps.

  • Adam, Sajan,

    Thank you very much guys. This helps a lot.


    When you say Vertica sometimes does not replicate data to all the node, does it depend on the size of the data?

    right now i am  doing a proof of concept demo for vertica. Just started. Got three nodes. Theoretically for three node K=1 should stand true. Am i being wrong in my thinking?

  • Hi Anmol,

    The short answer is that you are right in your thinking in terms of Vertica's defaults.  But, those defaults are easy to change.

    If you're getting that message, it usually means that your design has been marked something other than Ksafe=1.  Look up the "MARK_DESIGN_KSAFE()" meta-function in Vertica's documentation.  Note that the function just sets the default K-safety for new tables; you may have to update existing tables to add replication.

    The full answer to your question requires (and is provided by) an understanding of Vertica Projections.  Projections are the storage for a table's data.  Vertica will create a particular standard set of projections for you by default.  But by creating your own projections (or by doing any of a variety of things that change Vertica's behavior in creating projections), you can specify whether a given table is stored replicated on all nodes, or replicated once, or replicated several times but segmented in different ways to match data up for JOINs, etc.  The details of replication have a huge impact not only on reliability and disk-space usage, but on query and data-load performance as well.  People (or, more often, our automated Database Designer tool) design their projections quite carefully.

    The behavior of projections does not depend on the size of the data.  (Though empty tables are a special case -- some things don't get set up until you load your first row.  Probably not relevant to your question, but if you investigate, you may see this in our system tables.)

    You might find the following to be helpful:  -- Free online training on projections (there's in-person training listed on that site as well but it's not free)  -- Vertica Administrator's Guide.  Lots of things here; you might be interested in pages 148, 116, and 163, among various others.  -- Documentation on things to do with projections


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file