LGE set very low - unable to start the DB now

edited December 17 in General Discussion

We have a single vertica server that somehow after a server crash ended up having LGE set to 5. The database wouldn't start. There are no projections behind the ahm as per projection_checkpoint_epochs, consequently, abortrecovery doesn't help to start the database.

Whenever we try to pass an LGE greater than the AHM such as this:

admintools -t restart_db -d ourdatabase -e <a value greater than the AHM>

we get an error, that the epoch should be between the AHM and 5! Which is impossible, because the AHM is currently greater than 5.
Any suggestions please?

**Update: ** we've found that normally in vertica.log when the current LGE is printed, it is preceeded by a full list of projections with their corresponding CPE's like so:

<INFO> Get Local Node LGE: projection etl.table1_super, CPE is 0x3b1989
<INFO> Get Local Node LGE: projection etl.table2_super, CPE is 0x3b1989
...

But in our case, we get this instead:

<INFO> Found 4 missing DFS files
<INFO> Rollback Txn: a0000002c51162 'DFSUtil::getLocalNodeLGE'
<INFO> Rollback Txn: a0000002c51163 'ProjUtil::getLocalNodeLGE

followed by

<INFO> My local node LGE = 0x5 and current epoch = 0x3b3367

Best Answer

  • Answer ✓

    Thanks for the answer, SruthiA. Are there any "normal" circumstances you can think of, under which LGE can sink below min(projection_checkpoint_epochs.checkpoint_epoch)? If so, what are some of those situations?
    Thanks for helping!

Answers

  • Just to give an update. I was able to produce a low LGE situation on another machine but this time I could easily find the projections that caused it by querying projection_checkpoint_epochs. Basically the system LGE = min(checkoint_epoch) from projection_checkpoint_epochs. This makes sense and I know how to deal with. But on the first system it doesn't work like that for some reason.

  • SruthiASruthiA Administrator

    @dimitri_p : That is good to know. for the first system if you still need assistance, please open a support case as it requires webex.

  • @SruthiA said:
    @dimitri_p : That is good to know. for the first system if you still need assistance, please open a support case as it requires webex.

    I have updated the information in the first post, does that give any clues? How do we recover the missing DFS files?

  • SruthiASruthiA Administrator

    I just reviewed it and I think we can fix it using catalog editor... but it requires webex.. so please open a support case.

  • edited December 19

    For everyone reading this in case you have a similar issue, ours got fixed by using a solution from the following thread:

    https://forum.vertica.com/discussion/243077/vertica-startup-failure-due-to-power-cut-asr-required

    and I also list it here:

    -- Try the following but please note that you might lose data
    -- Start the database in unsafe mode:
    admintools -t start_db -d YourDBname -p DBpassword -U
    -- If dfs files missing remove the dfs files:
    set session characteristics as transaction read write;
    \i /opt/vertica/packages/logsearch/ddl/uninstall.sql
    -- Verify again LGE & AHM and you will find that LGE is now >=AHM.
    select ahm_epoch,last_good_epoch from system;
    -- Stop the db
    $ admintools -t stop_db -d DBname
    -- Try to start it normally
    $ admintools -t start_db -d DBname
    -- Connect to db and recover the dfs files by running below command
    \i /opt/vertica/packages/logsearch/ddl/install.sql

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file