Vertica Crash
We are currently running v 7.2.3-11 on single node.
Vertica is going down after 4-5 days regularly and when attempted to start it takes very long time nearly an hour.
This has happened nearly 5 times now.
I tailed the vertica.log, don't see any PANIC, FATAL, ERROR.
I didn't see anything reported on ErrorReport.txt
I have one more issue when I try to restart, it is taking more than an hour to come up. When it is coming up, this is what i observe:
2017-03-17 04:12:08.875 Main:0x90e99e0 [ResourceManager] pool recovery - Queries: 1 Threads: 5216 File Handles: 33458 Memory(KB): 6302166
After above log it is stuck and does not move for about 30-45 mins
Once it moved I see below log:
2017-03-17 04:12:50.488 Main:0x90e99e0 [Catalog] Queueing unknown file in storage directory for removal [/db_data/data/drdata/v_drdata_node0001_data/409/025b8f57674579e47ef88a72848fd19d00a00000a7d4e579_0.gt]
After above log it is again stuck for about 30 mins
Once it moves I see below log:
2017-03-17 04:43:43.901 unknown:0x7f424de79780 [SAL] Unmounting file system 2(Libhdfs++ File System).
After above log I get error on my terminal where i restarted that -
Error starting database, no nodes are up
Press RETURN to continue
Then i get error that -
"Database startup failed, but enough information is available to start the database from a previous epoch.
Do you really want to restart the database from epoch 15981105?"
Once I enter Yes here, the database comes up.
I want to find out has anyone faces this issue and found any way to get this fixed.
Comments
Vertica database that is down as a result of crash may result in loss of data for most recent transactions that had data in WOS memory before the crash occured. When adminstrator starts database, admintools will prompt DBA with good recovery epoch number and timestamp to accept. Data loaded after this time stamp had to be be reloaded after database startup is complete. This is normal startup process for databse that crashed.
Question is why are you see crashes? Have you check /var/log/messages and dblog ? If you can please upgrade to 7.2.3-16 that has some memory related fixes. check release notes for more details.