We are currently running v 7.2.3-11 on single node.
Vertica is going down after 4-5 days regularly and when attempted to start it takes very long time nearly an hour.
This has happened nearly 5 times now.
I tailed the vertica.log, don't see any PANIC, FATAL, ERROR.
I didn't see anything reported on ErrorReport.txt
I have one more issue when I try to restart, it is taking more than an hour to come up. When it is coming up, this is what i observe:
2017-03-17 04:12:08.875 Main:0x90e99e0 [ResourceManager] pool recovery - Queries: 1 Threads: 5216 File Handles: 33458 Memory(KB): 6302166
After above log it is stuck and does not move for about 30-45 mins
Once it moved I see below log:
2017-03-17 04:12:50.488 Main:0x90e99e0 [Catalog] Queueing unknown file in storage directory for removal [/db_data/data/drdata/v_drdata_node0001_data/409/025b8f57674579e47ef88a72848fd19d00a00000a7d4e579_0.gt]
After above log it is again stuck for about 30 mins
Once it moves I see below log:
2017-03-17 04:43:43.901 unknown:0x7f424de79780 [SAL] Unmounting file system 2(Libhdfs++ File System).
After above log I get error on my terminal where i restarted that -
Error starting database, no nodes are up
Press RETURN to continue
Then i get error that -
"Database startup failed, but enough information is available to start the database from a previous epoch.
Do you really want to restart the database from epoch 15981105?"
Once I enter Yes here, the database comes up.
I want to find out has anyone faces this issue and found any way to get this fixed.