How can you test k-safety ?
How can you test k-safety ?
I have a 3 node cluster - I have run DBD and a fairly large query that takes approx 75 sec to run before DBD, after DBD it takes approx. 35 sec to run.
dbadmin=> SELECT DESIGNED_FAULT_TOLERANCE, CURRENT_FAULT_TOLERANCE FROM SYSTEM;
DESIGNED_FAULT_TOLERANCE | CURRENT_FAULT_TOLERANCE
--------------------------+-------------------------
1 | 1
(1 row)
I started the fairly large query, then did a kill -9 on the vertica process on node 3.
This is the msg i got on node 1 -
dbadmin=> \i /home/dbadmin/sql_scripts/test_fact_self_join_wo_sysdate.sql
vsql:/home/dbadmin/sql_scripts/test_fact_self_join_wo_sysdate.sql:8: WARNING 4539: Received no response from v_usac_node0002 in abandon plan
vsql:/home/dbadmin/sql_scripts/test_fact_self_join_wo_sysdate.sql:8: WARNING 4539: Received no response from v_usac_node0002 in roll back transaction
vsql:/home/dbadmin/sql_scripts/test_fact_self_join_wo_sysdate.sql:8: ERROR 4142: Node failure during execution
dbadmin=>
I was kind of expecting the query to finish (albeit a lot slower that even 75 seconds).
Is this not the proper way to look at k-safety? How can I test k-safety in a 3 node cluster?
Answers
Sorry - I killed the process on node 2 of the cluster.
I would suggest you read a bit on K-safety. https://my.vertica.com/docs/8.0.x/HTML/Content/Authoring/Glossary/K-Safety.htm
K safety is about the availability of your data.
Do not confuse k-safety with node safety. There is a good article in our KB site about this :
https://my.vertica.com/kb/KSafetyBestPractices/Content/BestPractices/KSafetyBestPractices.htm
Hope this helps to make the concept more clear