Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

recovery failing after power outage

Hi, I have a test environment that experienced a power outage on all 3 nodes (vertica-9.2.0-7) and the storage array. The system is not critical but it's weird that I cannot get the DB to recover:

[[email protected] ~]$ admintools -t start_db -d opsadb -U
Info: no password specified, using none
    Starting nodes: 
        v_opsadb_node0001 (192.168.252.31)
        v_opsadb_node0002 (192.168.252.216)
        v_opsadb_node0003 (192.168.252.217)
    Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
    Node Status: v_opsadb_node0001: (DOWN) v_opsadb_node0002: (DOWN) v_opsadb_node0003: (DOWN) 
    Node Status: v_opsadb_node0001: (DOWN) v_opsadb_node0002: (DOWN) v_opsadb_node0003: (DOWN) 
    Node Status: v_opsadb_node0001: (DOWN) v_opsadb_node0002: (DOWN) v_opsadb_node0003: (DOWN) 
    Node Status: v_opsadb_node0001: (DOWN) v_opsadb_node0002: (DOWN) v_opsadb_node0003: (DOWN) 
    Node Status: v_opsadb_node0001: (DOWN) v_opsadb_node0002: (DOWN) v_opsadb_node0003: (DOWN) 
    Node Status: v_opsadb_node0001: (DOWN) v_opsadb_node0002: (DOWN) v_opsadb_node0003: (DOWN) 
    Node Status: v_opsadb_node0001: (UP) v_opsadb_node0002: (UP) v_opsadb_node0003: (UP) 
Database opsadb: Startup Succeeded.  All Nodes are UP
[[email protected] ~]$ vsql 
...
dbadmin=> SELECT get_ahm_epoch();
 get_ahm_epoch 
---------------
     109626381
(1 row)
dbadmin=> SELECT get_expected_recovery_epoch();
INFO 4544:  Recovery Epoch Computation:
Node Dependencies:
011 - cnt: 847
101 - cnt: 847
110 - cnt: 847
111 - cnt: 158

001 - name: v_opsadb_node0001
010 - name: v_opsadb_node0002
100 - name: v_opsadb_node0003
Nodes certainly in the cluster:
    Node 2(v_opsadb_node0003), epoch 109610889
    Node 1(v_opsadb_node0002), epoch 104348833
Filling more nodes to satisfy node dependencies:
Data dependencies fulfilled, remaining nodes LGEs don't matter:
    Node 0(v_opsadb_node0001), epoch 104348128
--
 get_expected_recovery_epoch 
-----------------------------
                   104348833
(1 row)

So far so good, we should be able to recover to 104348833. But:

[[email protected] ~]$ admintools -t restart_db -d opsadb -e '104348833' -p xxx
Invalid value for last good epoch: '104348833'
Epoch number must be 'last' or between 109626381 and 104348833 inclusive

I have tried various values for the epoch, including the 2 mentioned values - always the same error message. Now my questions:
1. The range is backwards, AHM > LGE. Is that the reason for the (poor!) error message? Is it that my recovery epoch must be >=AHM and <=LGE, and therefore I don't have a chance of (normal) recovery here?
2. Can someone come up with a hypothesis about what must have gone wrong to end up like that? As mentioned, the power loss affected the 3 nodes and the storage, but no disks were damaged. I thought the DB can always be recovered if the files that made it to the disk are unharmed ...

I know (after reading https://softwaresupport.softwaregrp.com/doc/KM03449287 ) that I can try to salvage table data, but more than 2000 projections are affected. As it is a test env I will just revert to an older snapshot - this post is only for my curiosity.
Thank you for any insights!

Sign In or Register to comment.

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.