can I set up master-master or master-slave replication on vertica cluster

I have 2 vertica cluster and each has 5 nodes(We need some kind of HA). I'd like to use 2nd cluster as backup so that when some critical issues happen on the 1st cluster and 1st cluster could not serve request, I can ask people use the 2nd cluster. So I need to replicate data from 1st cluster to 2nd cluster. Is there way to do it (like MySQL database master-master or master-slave replication)? 


  • Navin_CNavin_C Vertica Customer
    Hi Bin Yu,

    Take a look at Fault Groups in Vertica , this might help you.

    Also Vertica has its own HA inbuilt in the cluster which is called K-safety. It takes care of any node failure in the cluster and the internal load balancing and connection fail-over helps you to run the cluster even if some nodes are down.

    But to bring this effect on the Cluster level (if many nodes in a cluster go down the database will go down), Fault groups can help you to configure this and add additional HA to your data.

    Hope this helps.
  • I know K-safety. I heard that it is not good to set K-safety to 2 or above 2 due to the bad performance. We have 10 node, K-safety=1 is not enough I think. So that is the reason we split the cluster into 2 and try to use replication to make 2 cluster in sync. 
    I am not sure in the latest vertion 7.1.3 if it is OK to set K-safety=2 without any performance penalty.  

  • Hi Bin,

    Regarding k>=2, perhaps you could explain your use case in more detail? If there's a need for more HA reliability, we would certainly like to address that.

    There was an instance a little while back where we were looking at a cluster with I think a little over a hundred nodes. Its owner was complaining about reduced performance. We quickly realized that they had something like 15 nodes down, and they just hadn't noticed because everything kept working. This, like most Vertica clusters, was at k=1.

    Usually people notice when their servers crash, so it's uncommon for people to really stress K-safety. (That cluster could have survived more down nodes, though with decreasing performance.) But we're actually pretty robust in terms of failure probability at k=1; very robust at k=2. We often hear that users aren't aren't interested in paying their hardware vendor for the extra hard drives needed to store more replicas than that; the HA afforded by k=1 is enough for them.

    If you do want to run at k=2, the performance penalty is simply that Vertica has to write out additional copies of the data. Extra disk I/O, extra network traffic, etc. This is true of any replication solution. (There is no impact on query performance; just more time to make the extra copies of the data.) Our implementation does have a few cases where it could be faster, but usually (for most use cases that I've personally seen) the extra disk and network I/O seem to be the primary concerns. If you find that our implementation is particularly slow beyond that, please let us know; either here, or get in touch with your sales cobtact and they may be able to help.

  • Navin_CNavin_C Vertica Customer
    Hi Adam,

    Thanks for pointing the concerns on the reduced performance.

    To emphasize on same:

    Keeping K-safety factor 2 will only impact loading , rebalancing, refresh operations in your database.
    As said by Adam, sometimes you may find query performance to be good with k = 2.
    Its all due to availability of data on many nodes and vertica does not have to travel far among all nodes to fetch data, which make life easier for the query optimizer.

    Regarding your cluster size.
    I think, If you will be having a K = 1 for 10 node cluster ,that's just going to fine.

    >>We have 10 node, K-safety=1 is not enough I think. So that is the reason we split the cluster into 2 and try to use replication to make 2 cluster in sync. 

    If you think k=1 is not safe for your cluster and are thinking of splitting the cluster to half ( 5 node cluster 1 and 5 nodes cluster 2) then you are indirectly reducing the Capacity of the cluster and the performance of your cluster.
    Even after splitting the cluster, you will need a CDC tool to keep both clusters in sync, which will be additional workload for your cluster 1 and impact query performance on cluster 1.
    An surely you will have to develop a new mechanism for Cluster failover support i.e. when critical nodes from cluster 1 go down, switch all sessions / connections to Cluster 2 and vice versa.

    One more solutions is to have active standby nodes for your critical nodes.
    Suppose you have a 8 node cluster and 2 nodes in standby mode, then you have reeduced risk of Cluster failover. As your failed nodes will be replaced by standby nodes.

    Hope this helps

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file