Error starting database, no nodes are up

i have three node cluster. now unable to start database i am getting the error

        Error starting database, no nodes are up

how we can resolve this?

Comments

  • Check those files, they should have more details why the database can't start.

    1- vertica.log that you will find in the <<database>>/v_<<database>>_catalog/ 
    2- check the dblog that you can find in the database folder.

    Hope this help, 

    Eugenia
  • Hi,

    I am posting last part of the log files 

    Vertica.log

    2014-04-11 10:25:28.859 Main:0x4b92680 [SAL] <INFO> Queueing directory [/catalog/ubossdb/v_ubossdb_node0001_data/371/45035997392014371.del.del.del.del.del.del.del.del.del.del] for removal2014-04-11 10:25:28.863 Main:0x4b92680 [SAL] <INFO> Queueing directory [/catalog/ubossdb/v_ubossdb_node0001_data/087/45035997392014087.del.del.del.del.del.del.del.del.del.del] for removal
    2014-04-11 10:25:28.864 Main:0x4b92680 [SAL] <INFO> Queueing directory [/catalog/ubossdb/v_ubossdb_node0001_data/087/45035997392012087.del.del.del.del.del.del.del.del.del.del] for removal
    2014-04-11 10:25:28.992 Main:0x4b92680 [Catalog] <INFO> Checking 23947 files in 2 threads
    2014-04-11 10:25:41.957 Main:0x4b92680 [Recover] <INFO> Loading UDx libraries
    2014-04-11 10:25:41.957 Main:0x4b92680 [Recover] <INFO> Setting up UDx pointers
    2014-04-11 10:25:41.961 Main:0x4b92680 <PANIC> @v_ubossdb_node0001: VX001/2973: Data consistency problems found; startup aborted
            HINT:  Check that all file systems are properly mounted.  Also, the --force option can be used to delete corrupted data and recover from the cluster
            LOCATION:  mainEntryPoint, /scratch_a/release/vbuild/vertica/Basics/vertica.cpp:1166
    2014-04-11 10:25:42.092 Main:0x4b92680 [Main] <PANIC> Wrote backtrace to ErrorReport.txt

    dbLog

    04/11/14 10:24:51 SP_connect: DEBUG: Auth list is: NULL
  • As you can see there are some data constancy problems that can be some corrupted files. You may have data lost.
    2014-04-11 10:25:41.961 Main:0x4b92680 <PANIC> @v_ubossdb_node0001: VX001/2973: Data consistency problems found; startup aborted
            HINT:  Check that all file systems are properly mounted.  Also, the --force option can be used to delete corrupted data and recover from the cluster
    How many nodes do you have? If you have 3 this should be no problem as the data can recover from the other nodes, but if this is a 1 node database, you may lost data. 

    Let me know how many nodes do you have.

    Eugenia

  • 3 node cluster but in vmware
  • i can access all three nodes , can you tell me the procedure to recover data
  • It should be Ok if it is vmware. Are all the 3 nodes show this PANIC?  
  • node2 and node3 vertica.log 

    node2 

     Data consistency problems found; startup aborted
            HINT:  Check that all file systems are properly mounted.  Also, the --force option can be used to delete corrupted data and recover from the cluster
            LOCATION:  mainEntryPoint, /scratch_a/release/vbuild/vertica/Basics/vertica.cpp:1166
    2014-04-11 10:25:48.808 Main:0x66eb680 [Main] <PANIC> Wrote backtrace to ErrorReport.txt

    node3

    2014-04-11 10:25:43.466 Main:0x5851680 [Recover] <INFO> Loading UDx libraries2014-04-11 10:25:43.466 Main:0x5851680 [Recover] <INFO> Setting up UDx pointers
    2014-04-11 10:25:43.471 Main:0x5851680 <PANIC> @v_ubossdb_node0003: VX001/2973: Data consistency problems found; startup aborted
            HINT:  Check that all file systems are properly mounted.  Also, the --force option can be used to delete corrupted data and recover from the cluster
            LOCATION:  mainEntryPoint, /scratch_a/release/vbuild/vertica/Basics/vertica.cpp:1166
    2014-04-11 10:25:43.593 Main:0x5851680 [Main] <PANIC> Wrote backtrace to ErrorReport.txt




  • Do you think that you have the data directory mounted? It is weird that the 3 nodes have the same problem. 

    Try to restart using admintools the last good epoch. In admintools go to advance menu/rollback to the last epoch, see what number it gives you. 

    Let me know. 
    Eugenia
  • Hi,

    i can see the files data directory.
     
    Last epoch number is 0.

    Najeeb
  • Prasanta_PalPrasanta_Pal - Select Field - Employee
    If you are a Enterprise Edition customer, please create support case, there might be a projection whose CPE is set to zero by some reason.
  • We are using community edition 3 node cluster. what i can do?
  • Hi,

    Anybody can help me?

    Thanks,
    Najeeb
  • Hi Najeeb,

    Unfortunately, it sounds like you have lost your data.  Vertica is unable to recover a consistent copy of your database; the newest complete, not-corrupt version of the database that is available is from epoch 0, ie., a new database.

    Running three separate nodes on VMware, especially if they're on the same underlying physical hardware (?), is a risky proposition.  If your disk stops working, or if the underlying operating system is ignoring requests in the VM to flush data to disk, etc., then you can easily lose data; sometimes a lot of data.

    It may be possible to extract some data from the internal data files; or to drop corrupt tables and enable the system to restore to a consistent view for just some of your tables.  Unfortunately, I don't think that this process is currently documented, nor is it automated; it can end up involving a bunch of low-level hacking of the internals of the files inside the database.  Not something (in the general case) that would be amenable to a simple walk-through.  For folks with a support contract, this is a case where our support folks could walk you through the process.  Short of that, maybe there are third-party consultants out there who have figured out the process and would be willing to help you through it, to see how much data is recoverable?  The alternative would be to start over and restore from a backup.

    Adam
  • If anyone does know any specific tips, they are of course welcome to post them.

    Hacking on the internals of Vertica's data store is very much a "under-Support-guidance-only" type of thing.  If you do something wrong, you can easily make things worse rather than better.  That said, if you don't have access to Support, and you have nothing to lose / can't get at your data anyway, and someone has some ideas...
  • Thanks Adam
  • Thanks Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file