Why does my cluster have a CURRENT_FAULT_TOLERANCE of 0?
Per the recommendation of Vertica employees here, 3-node Vertica clusters should be set to a K-safety of 1.
When I went to check on the K-safety of my 3-node cluster, this is what I found:
When I went to check on the K-safety of my 3-node cluster, this is what I found:
DESIGNED_FAULT_TOLERANCE CURRENT_FAULT_TOLERANCEWhy is my CURRENT_FAULT_TOLERANCE set to 0, and how can I make it 1?
1 0
0
Comments
K-safe = 0 means, if any of your nodes from your 3 node cluster goes down,it will lead whole of the cluster to to get down.
K-safety is a measure of fault tolerance in the database cluster. The value K represents the number of replicas of the data in the database that exist in the database cluster. These replicas allow other nodes to take over for failed nodes, allowing the database to continue running while still ensuring data integrity. If more than K nodes in the database fail, some of the data in the database may become unavailable. In that case, the database is considered unsafe and automatically shuts down.
Potentially, up to half the nodes in a database with a K-safety level of 1 could fail without causing the database to shut down. As long as the data on each failed node is available from another active node, the database continues to run.
You can make your cluster k-safe=1 using below call to function:
SELECT MARK_DESIGN_KSAFE(1);
It might show, some projections having problems like - "not having sufficient buddy projections"
In that case you need to create new buddy projections for same projections as OFFSET 1 projection. You can have information for creating buddy projections from below link:
https://my.vertica.com/docs/6.1.x/HTML/index.htm#12173.htm
Regards'
Abhishek
I know what K-safety is, and my cluster already has a designed K-safety of 1, as my original post shows. The query I used to get those results, by the way, is as follows: My question is why my cluster has a CURRENT_FAULT_TOLERANCE of 0. The link at the very end of your post seems to be related to what I'm asking, but I'm not sure how to to use it.
Again, how can I find out why my cluster has a CURRENT_FAULT_TOLERANCE of 0 when the DESIGNED_FAULT_TOLERANCE is 1? And how can I fix it so that they are both 1?
Nick
CURRENT_FAULT_TOLERANCE checks all of your tables and their projections, and determines how many faults the system can withstand in the worst case.
DESIGNED_FAULT_TOLERANCE affects only tables which are being designed. This primarily means new tables; also tables that the Database Designer is being run on.
SELECT MARK_DESIGN_KSAFE(...); sets DESIGNED_FAULT_TOLERANCE. So it takes effect going forward, but it doesn't affect existing tables.
Adding the necessary replication to existing tables can be a very expensive operation, depending on your data size and network performance. The goal is to allow database admins to "flip the switch", so to speak, for new tables immediately; then go back and retroactively update (or maybe drop, or maybe migrate to a new table, etc) old tables as needed, potentially during a period when the cluster is under lighter load.
The easiest way to update your tables is to run the Database Designer. If you haven't done so before, this will re-arrange their storage and likely significantly increase your query performance and decrease your disk usage. Note that this can take a while.
If you prefer to create projections manually, you will have to figure out which tables only have a single projection, and add a buddy with the same segmentation and "OFFSET 1". The "SELECT MARK_DESIGN_KSAFE(...)" command should tell you which projections need buddies, when you run it. Alternatively, the following query will tell you tables that only have a single projection, which must not be K-safe: (Note that a table can have multiple projections and still not be K-safe if those projections are segmented on different columns. So the output of "SELECT MARK_DESIGN_KSAFE(...)" should be your canonical source of information.)
Adam