Data file corruption for large table
Our Data Center had a power outage a couple weeks ago which caused corruption in a couple of our databases (1 postgres db and 1 Oracle db). Recently this manifested in our production Analytics data warehouse, which is Vertica.
error_messages.message gives me "Data file may be corrupt. Ensure that all hardware (disk and memory) is working properly. Possible solutions are to delete the file /var/lib/vertica/anly_dw/v_anly_dw_node0002_data/545/49539595941346545/49539595941346545_0.fdb while the node is down, and then allow the node to recover, or truncate the table data"
Truncating and restoring the table is an option, but it might take a couple days. I would prefer the first approach of shutting down the host, deleting the corrupted file and allowing an auto recover upon restart of the host. However, my first attempt at this failed, here is the log entry.
2017-06-12 18:41:07.179 Main:0x5f39b50 @v_anly_dw_node0002: VX001/2973: Data consistency problems found; startup aborted
HINT: Check that all file systems are properly mounted. Also, the --force option can be used to delete corrupted data and recover from the cluster
LOCATION: mainEntryPoint, /scratch_a/release/vbuild/vertica/Basics/vertica.cpp:1346
At this point I am afraid to use --force. This is the command I'm thinking of running: admintools -t restart_node -s [node_ip] -d [db_name] -p [password] -F. Will this really work and re-create the deleted .fdb file from nothing as part of the auto recover?
If this somehow hoses the cluster, I don't have much experience restoring Vertica databases. Would I need to put the original (corrupted) fdb file in it's original location and recover to the last restore point? I need something to fall back on if the --force/-F messes something up, so that concerns me.
My ultimate question is, will restart_node with -F option work if I drop the corrupt file? And if not, can I recover to the last restore point?
I have not looked at any hardware issues. That does not seem to be the problem here, though I'm not absolutely sure.
Thanks. Any help is greatly appreciated.