LGE set very low - unable to start the DB now
We have a single vertica server that somehow after a server crash ended up having LGE set to 5. The database wouldn't start. There are no projections behind the ahm as per projection_checkpoint_epochs, consequently, abortrecovery doesn't help to start the database.
Whenever we try to pass an LGE greater than the AHM such as this:
admintools -t restart_db -d ourdatabase -e <a value greater than the AHM>
we get an error, that the epoch should be between the AHM and 5! Which is impossible, because the AHM is currently greater than 5.
Any suggestions please?
**Update: ** we've found that normally in vertica.log when the current LGE is printed, it is preceeded by a full list of projections with their corresponding CPE's like so:
<INFO> Get Local Node LGE: projection etl.table1_super, CPE is 0x3b1989
<INFO> Get Local Node LGE: projection etl.table2_super, CPE is 0x3b1989
...
But in our case, we get this instead:
<INFO> Found 4 missing DFS files
<INFO> Rollback Txn: a0000002c51162 'DFSUtil::getLocalNodeLGE'
<INFO> Rollback Txn: a0000002c51163 'ProjUtil::getLocalNodeLGE
followed by
<INFO> My local node LGE = 0x5 and current epoch = 0x3b3367
Best Answer
-
Thanks for the answer, SruthiA. Are there any "normal" circumstances you can think of, under which LGE can sink below min(projection_checkpoint_epochs.checkpoint_epoch)? If so, what are some of those situations?
Thanks for helping!0
Answers
Just to give an update. I was able to produce a low LGE situation on another machine but this time I could easily find the projections that caused it by querying projection_checkpoint_epochs. Basically the system LGE = min(checkoint_epoch) from projection_checkpoint_epochs. This makes sense and I know how to deal with. But on the first system it doesn't work like that for some reason.
@dimitri_p : That is good to know. for the first system if you still need assistance, please open a support case as it requires webex.
I have updated the information in the first post, does that give any clues? How do we recover the missing DFS files?
I just reviewed it and I think we can fix it using catalog editor... but it requires webex.. so please open a support case.
For everyone reading this in case you have a similar issue, ours got fixed by using a solution from the following thread:
https://forum.vertica.com/discussion/243077/vertica-startup-failure-due-to-power-cut-asr-required
and I also list it here:
-- Try the following but please note that you might lose data
-- Start the database in unsafe mode:
admintools -t start_db -d YourDBname -p DBpassword -U
-- If dfs files missing remove the dfs files:
set session characteristics as transaction read write;
\i /opt/vertica/packages/logsearch/ddl/uninstall.sql
-- Verify again LGE & AHM and you will find that LGE is now >=AHM.
select ahm_epoch,last_good_epoch from system;
-- Stop the db
$ admintools -t stop_db -d DBname
-- Try to start it normally
$ admintools -t start_db -d DBname
-- Connect to db and recover the dfs files by running below command
\i /opt/vertica/packages/logsearch/ddl/install.sql