Hw failure: What to do after ?
I had a 7 nodes cluster running, and had to reinstall one node from scratch after a hw failure of one node.
So I re-installed Redhat on the new node, and then, as I wanted to upgrade the vertica version from 22.214.171.124 to 126.96.36.199, I stopped the DB, run the upgrade, which correctly upgrade all nodes including the new one.
Then I restarted the DB, but the "new" node remained down.
I tried it many times, always the same result, all nodes went UP except the new node.
Then I tried to do a "restart Vertica on host" : same result.
So I decided to remove this host from the DB: no way, because Vertica has to be up and running, and that was not the case.
I noticed that the usual vertica process did not run on my new node, so I did a "/etc/init.d/verticad start"
I seemed to succeed, but when doing "/etc/init.d/vericad status", it says: "Not OK", and when retrying the stat, it hang for one hour, without any visible activity, not even a vertica process started.
So I am completely stuck, with a node that refuses to go UP, and that I can not remove from the DB.
Is there another solution, beyong recreating the DB from scratch and loosing all my data ?
Thanks for helping.