Vertica Crash

edited March 2017 in General Discussion

We are currently running v 7.2.3-11 on single node.

Vertica is going down after 4-5 days regularly and when attempted to start it takes very long time nearly an hour.

This has happened nearly 5 times now.

I tailed the vertica.log, don't see any PANIC, FATAL, ERROR.

I didn't see anything reported on ErrorReport.txt

I have one more issue when I try to restart, it is taking more than an hour to come up. When it is coming up, this is what i observe:

2017-03-17 04:12:08.875 Main:0x90e99e0 [ResourceManager] pool recovery - Queries: 1 Threads: 5216 File Handles: 33458 Memory(KB): 6302166

After above log it is stuck and does not move for about 30-45 mins


Once it moved I see below log:

2017-03-17 04:12:50.488 Main:0x90e99e0 [Catalog] Queueing unknown file in storage directory for removal [/db_data/data/drdata/v_drdata_node0001_data/409/025b8f57674579e47ef88a72848fd19d00a00000a7d4e579_0.gt]

After above log it is again stuck for about 30 mins


Once it moves I see below log:

2017-03-17 04:43:43.901 unknown:0x7f424de79780 [SAL] Unmounting file system 2(Libhdfs++ File System).

After above log I get error on my terminal where i restarted that -

Error starting database, no nodes are up
Press RETURN to continue


Then i get error that -

"Database startup failed, but enough information is available to start the database from a previous epoch.

Do you really want to restart the database from epoch 15981105?"

Once I enter Yes here, the database comes up.

I want to find out has anyone faces this issue and found any way to get this fixed.

Comments

  • [Deleted User][Deleted User] Administrator

    Vertica database that is down as a result of crash may result in loss of data for most recent transactions that had data in WOS memory before the crash occured. When adminstrator starts database, admintools will prompt DBA with good recovery epoch number and timestamp to accept. Data loaded after this time stamp had to be be reloaded after database startup is complete. This is normal startup process for databse that crashed.

    Question is why are you see crashes? Have you check /var/log/messages and dblog ? If you can please upgrade to 7.2.3-16 that has some memory related fixes. check release notes for more details.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file