Add a new node to database failed.
Hello,
I tried to add a second node to a single node cluster.
After that, I added new node to my DB. At this moment, DB is supposed to be stop and started back.
But, in my case, DB didn't stop because of a running request I did not spot.
Now, I can't even restart my DB. I got dozens of following messages:
2015-03-10 15:01:13.060 nameless:0x7f3b34001a80 [Catalog] <WARNING> Error getting size of file [/mnt/sdb1/data/bbcbiprod/v_bbcbiprod_node0001_data/281/45035996281655281/45035996281655281_0.fdb]: No such file or directory
Of course, since the DB isn't running, I can't re-add the new node, or remove it.
In one word, I'm stucked...
Is there any way to be able to recover ? First time I tried to restart, adminTools told me I had to restart at "Epoch 0" which I refused (I guess that means I have to checkpoint and would lost all my data)
Any help appreciated,
Regards,
Jean Baptiste Favre
Comments
Hello Jean,
I assume then that your database came to a stop? If you look back in vertica.log, you can see how it shut down. Did it shut down cleanly or come to an abrupt stop with an error?
The database has detected a missing data file. It doesn't seem that the error is related to attempting to add a new node. The database checks for data files at start-up, so this file could have gone missing at any time since the previous time your database started. For some users, that can be many days earlier.
Without this data file, the database cannot start with consistent state. It's not clear why the file might be missing... perhaps you can investigate potential problems with the disk on your system or determine if a privileged user erroneously deleted the data manually. Can you find 45035996281655281_0.fdb anywhere? Perhaps you can recover it from somewhere and the database will then start.
The recommended way forward is to restore from a backup or to let the node recover from its peers in the cluster. Because you had a 1-node database, you did not have fault tolerance / data redundancy, so recovery from peers is not an option. Do you have a backup?
The only other way "out" is to engage technical support (if you have a contract) and they can help determine which table or projection is associated with the missing data. They may be able to limit data loss to that projection alone, rather than the entire database.
Sorry to hear about your troubles. Backups are your friend! Especially for databases without data redundancy.
- Derrick