Hw failure: What to do after ?
I had a 7 nodes cluster running, and had to reinstall one node from scratch after a hw failure of one node.
So I re-installed Redhat on the new node, and then, as I wanted to upgrade the vertica version from 188.8.131.52 to 184.108.40.206, I stopped the DB, run the upgrade, which correctly upgrade all nodes including the new one.
Then I restarted the DB, but the "new" node remained down.
I tried it many times, always the same result, all nodes went UP except the new node.
Then I tried to do a "restart Vertica on host" : same result.
So I decided to remove this host from the DB: no way, because Vertica has to be up and running, and that was not the case.
I noticed that the usual vertica process did not run on my new node, so I did a "/etc/init.d/verticad start"
I seemed to succeed, but when doing "/etc/init.d/vericad status", it says: "Not OK", and when retrying the stat, it hang for one hour, without any visible activity, not even a vertica process started.
So I am completely stuck, with a node that refuses to go UP, and that I can not remove from the DB.
Is there another solution, beyong recreating the DB from scratch and loosing all my data ?
Thanks for helping.
Leave a Comment
Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.
Please open a support ticket so that we can help you in troubleshooting.
I finally found a permission problem on the new node, dbadmin could not write in the DB directory.
It would be VERY usefull to see the error in admintools, instead of having to look into logfile.