Can't initialize Vertica in 3 nodes cluster after 1 machine down
Hi, I have a 3 nodes cluster, 1 of the machine is out of service because I need to reinstall however I am assuming that the other machines should work properly. I have the following messages from vertica.log in one of machines that should be working. WARNING:pyinotify:Unable to retrieve Watch object associated to <_RawEvent cookie=0 mask=0x8000 name='' wd=1 > 2013-08-21 10:04:48.123 Poll dispatch:0x6290450 [Comms] error SP_receive: Connection closed by spread 2013-08-21 10:04:48.123 Poll dispatch:0x6290450 [Comms] error SP_receive: The network socket experienced an error. This Spread mailbox will no longer work until the connection is disconnected and then reconnected 2013-08-21 22:21:00.232 Timer Service:0x7b566c0 @v_testdb_node0002: 00000/5021: Timer service done; closing session 2013-08-21 22:21:00.733 Main:0x5fec600 @v_testdb_node0002: 00000/3298: Event Posted: Event Code:6 Event Id:9 Event Severity: Informational [6] PostedTimestamp: 2013-08-21 22:21:00.733169 ExpirationTimestamp: 2081-09-09 00:35:07.733169 EventCodeDescription: Node State Change ProblemDescription: Changing node v_testdb_node0002 startup state to SHUTDOWN_ERROR DatabaseName: testdb Hostname: avert02 2013-08-21 22:21:00.733 Main:0x5fec600 [Recover] Changing node v_testdb_node0002 startup state from INITIALIZING to SHUTDOWN_ERROR 2013-08-21 22:21:00.733 Main:0x5fec600 [Txn] Begin Txn: b00000000043f5 'Recovery: Get last good epoch' 2013-08-21 22:21:00.733 Main:0x5fec600 [Txn] Starting Commit: Txn: b00000000043f5 'Recovery: Get last good epoch' 2013-08-21 22:21:00.734 Main:0x5fec600 [Txn] Commit Complete: Txn: b00000000043f5 at epoch 0xf 2013-08-21 22:21:00.734 Main:0x5fec600 [Recover] Manual recovery possible: Last good epoch=0xe Do I need to remove the broken node reference (/opt/vertica/sbin/update_vertica -R)? Vertica do not ignore a failing machine? Tkx
0
Comments