Node State Change, Changing Node ... Startup State To UP


I have a 3 note Vertica 7.0.0 cluster up and running,
but in the Management Console
every day I find 7-hundred of warnings of this kind:

SUMMARY:Node State Change

Host IP:192.168.111.115

Time Of Occurence:31 Mar 2014 14:11:29

Number Of Occurences Of This Message:50 (Last Occurence At: 31 Mar 2014 14:11:29)

Description:Changing Node V_jasper_node0002 Startup State To UP

SUMMARY:Node State Change

Host IP:192.168.111.116

Time Of Occurence:31 Mar 2014 14:11:29

Number Of Occurences Of This Message:50 (Last Occurence At: 31 Mar 2014 14:11:29)

Description:Changing Node V_jasper_node0003 Startup State To UP

SUMMARY:Node State Change

Host IP:192.168.111.114

Time Of Occurence:31 Mar 2014 14:11:29

Number Of Occurences Of This Message:50 (Last Occurence At: 31 Mar 2014 14:11:29)

Description:Changing Node V_jasper_node0001 Startup State To UP


Comments

  • Navin_CNavin_C Vertica Customer
    Hi Massimo,

    I would suggest , check the vertica.log file for knowing what is happening when the warning is fired.
    Check if your node goes down frequently, however this should not happen.

    Hope this helps


  • I have a 3 node Vertica 7.0 cluster on CentOs 6.5 VmWare Esx 5 Hyperviror, the n etwork interface is VMNX3
    the problem started after installation of VmwareTools...

    At the end one node goes DOWN :
    It seems to me a Spred Problem with the network !


    2014-03-31 17:27:38.327 Spread Client:0x75515d0 [Comms] <INFO> Saw membership message 8192 on V:jasper2014-03-31 17:27:38.327 Spread Client:0x75515d0 [Comms] <INFO> Saw transitional message; watch for lost daemons
    2014-03-31 17:27:38.327 Spread Client:0x75515d0 [Comms] <INFO> Saw membership message 8192 on Vertica:all
    2014-03-31 17:27:38.327 Spread Client:0x75515d0 [Comms] <INFO> Saw transitional message; watch for lost daemons
    2014-03-31 17:27:38.327 Spread Client:0x75515d0 [Comms] <INFO> Saw membership message 8192 on Vertica:join
    2014-03-31 17:27:38.327 Spread Client:0x75515d0 [Comms] <INFO> Saw transitional message; watch for lost daemons
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [Comms] <INFO> Saw membership message 6144 on V:jasper
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [Comms] <INFO> NETWORK change with 1 VS sets
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [Comms] <INFO> DB Group changed
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [Comms] <INFO>   Got current member #node_a#N192168111114, v_jasper_node0001 is UP
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [Comms] <INFO>   Got current member #node_b#N192168111115, v_jasper_node0002 is UP
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [VMPI] <INFO> DistCall: Set current group members called with 2 members
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [VMPI] <INFO> Removing 45035996273719658 from list of initialized nodes for session vertica01-5030:0x1355
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [VMPI] <INFO> Removing 45035996273719658 from list of initialized nodes for session vertica01-5030:0x1356
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [VMPI] <INFO> Removing 45035996273719658 from list of initialized nodes for session vertica01-5030:0x1725
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [VMPI] <INFO> Removing 45035996273719658 from list of initialized nodes for session vertica01-5030:0x1727

    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [Comms] <INFO> nodeSetNotifier: node v_jasper_node0003 left the cluster
    2014-03-31 17:27:38.328 Spread Client:0x75515d0 [Recover] <INFO> Node left cluster, reassessing k-safety...
    2014-03-31 17:27:38.351 Spread Client:0x75515d0 [Recover] <INFO> Checking Deps:Down bits: 100 Deps:
    011 - cnt: 159
    101 - cnt: 159
    110 - cnt: 159
    2014-03-31 17:27:38.351 Spread Client:0x75515d0 <LOG> @v_jasper_node0001: 00000/3298: Event Posted: Event Code:3 Event Id:0 Event Severity: Critical [2] PostedTimestamp: 2014-03-31 17:27:38.351722 ExpirationTimestamp: 2082-04-18 19:41:45.351722 EventCodeDescription: Current Fault Tolerance at Critical Level ProblemDescription: Loss of node v_jasper_node0001 will cause shutdown to occur. K=1 total number of nodes=3 DatabaseName: jasper Hostname: vertica01
    2014-03-31 17:27:38.352 Spread Client:0x75515d0 [Comms] <INFO> Saw membership message 6144 on Vertica:all
    2014-03-31 17:27:38.352 Spread Client:0x75515d0 [Comms] <INFO> Removing #node_c#N192168111116->v_jasper_node0003 from processToNode and other maps due to departure from Vertica:all 


  • Finally I SOLVED, those problems, vertica node goes down state, were caused by a wrong installation of vmwaretools on a vnmx3 10gb virtual interface... see http://steronius.blogspot.it/2013/01/install-vmware-tools-via-repository-for.html for a way to solve.


  • Here we go again, after a period in which everything worked, the nodes back off in a way that seems random, in the meantime I upgraded to version 7.0.1 Vertica .... monitoring does not detect network problems ...

    All 3 nodes are Centos 6.5 final hosted on vmware esx 5.1
    each 4 cpu -  Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz  with 32Gb RAM
    Linux  2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


    node 1 was doing moveout:

    2014-04-13 09:13:52.036 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO> (a00000006cdb21) Executing the moveout old replay delete plan2014-04-13 09:13:52.043 EEThread:0x5e0a140-a00000006cdb21 [EE] <INFO> Running ROS from sort buffer. Merge chunks = 1, merges per batch = 0
    2014-04-13 09:13:52.043 EEThread:0x5e0a140-a00000006cdb21 [EE] <INFO> Finished writing ROSes from sort buffer.
    2014-04-13 09:13:52.045 TM Moveout:0x7f876801b850-a00000006cdb21 [LocalPlanner] <INFO> (a00000006cdb21) Plan type: TM_REDELETE_MOVE, Plan subtype: UNKNOWN SUBTYPE - Created new DVMiniRos <45035996878849841,45035996878849843> pointing at storage 45035996878849825
    2014-04-13 09:13:52.045 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO> (a00000006cdb21) End of executing the moveout old replay delete plan
    2014-04-13 09:13:52.045 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO> (a00000006cdb21) Executing DVWos moveout plans
    2014-04-13 09:13:52.045 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO> (a00000006cdb21) getMiniROSsForDVWosMoveout: No DVWOSs to moveout
    2014-04-13 09:13:52.045 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO> (a00000006cdb21) Deleting Wos and DV contents
    2014-04-13 09:13:52.045 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO> (a00000006cdb21) Dropping source WOSs
    2014-04-13 09:13:52.045 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO>    45035996878849767
    2014-04-13 09:13:52.045 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO> (a00000006cdb21) Moved out 49152 bytes
    2014-04-13 09:13:52.047 TM Moveout:0x7f876801b850-a00000006cdb21 [EE] <INFO> Moveout projection public.GC_last_jgm860_DBD_81_seg_jfull_b1 - done
    2014-04-13 09:13:52.047 TM Moveout:0x7f876801b850-a00000006cdb21 [Txn] <INFO> Starting Commit: Txn: a00000006cdb21 'Moveout: (Table: public.GC_last_jgm860) (Projection: public.GC_last_jgm860_DBD_81_seg_jfull_b1)'
    2014-04-13 09:13:52.047 TM Moveout:0x7f876801b850 [Txn] <INFO> Commit Complete: Txn: a00000006cdb21 at epoch 0x85a77d
    2014-04-13 09:13:52.047 TM Moveout:0x7f876801b850 [TM] <INFO> Tuple Mover: moved out projection GC_last_jgm860_DBD_81_seg_jfull_b1
    2014-04-13 09:13:52.047 TM Moveout:0x7f876801b850-a00000006cdb24 [Txn] <INFO> Begin Txn: a00000006cdb24 'Moveout: Tuple Mover'
    2014-04-13 09:13:52.048 TM Moveout:0x7f876801b850-a00000006cdb24 [Txn] <INFO> Rollback Txn: a00000006cdb24 'Moveout: Tuple Mover'
    2014-04-13 09:13:52.048 TM Moveout:0x7f876801b850 [Util] <INFO> Task 'TM Moveout' enabled



    node 2 was doing moveout:



    2014-04-13 09:13:49.570 TM Mergeout(01):0x7f7d2c01b4f0-b000000014f806 [TM] <INFO> Tuple Mover: nothing to merge out
    2014-04-13 09:13:49.595 TM Mergeout(01):0x7f7d2c01b4f0-b000000014f806 [TM] <INFO> Tuple Mover: no DV to merge out
    2014-04-13 09:13:49.595 TM Mergeout(01):0x7f7d2c01b4f0-b000000014f806 [Txn] <INFO> Rollback Txn: b000000014f806 'Mergeout: Tuple Mover'
    2014-04-13 09:13:49.607 TM Mergeout(01):0x7f7d2c01b4f0 [Util] <INFO> Task 'TM Mergeout(01)' enabled
    2014-04-13 09:13:49.613 TM Mergeout(00):0x7f7d2c01bb80-b000000014f809 [TM] <INFO> Tuple Mover: nothing to merge out
    2014-04-13 09:13:49.626 TM Mergeout(00):0x7f7d2c01bb80-b000000014f809 [TM] <INFO> Tuple Mover: no DV to merge out
    2014-04-13 09:13:49.626 TM Mergeout(00):0x7f7d2c01bb80-b000000014f809 [Txn] <INFO> Rollback Txn: b000000014f809 'Mergeout: Tuple Mover'
    2014-04-13 09:13:49.631 TM Mergeout(00):0x7f7d2c01bb80-b000000014f80b [Txn] <INFO> Begin Txn: b000000014f80b 'collectMoveStorageJobs'
    2014-04-13 09:13:49.645 TM Mergeout(00):0x7f7d2c01bb80-b000000014f80b [Txn] <INFO> Rollback Txn: b000000014f80b 'collectMoveStorageJobs'
    2014-04-13 09:13:49.645 TM Mergeout(00):0x7f7d2c01bb80 [Util] <INFO> Task 'TM Mergeout(00)' enabled
    2014-04-13 09:13:52.002 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [Txn] <INFO> Begin Txn: b000000014f80c 'Moveout: Tuple Mover'
    2014-04-13 09:13:52.003 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [TM] <INFO> Tuple Mover: moving out projection GC_last_jgm860_DBD_81_seg_jfull_b1
    2014-04-13 09:13:52.003 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) Moveout projection public.GC_last_jgm860_DBD_81_seg_jfull_b1
    2014-04-13 09:13:52.003 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) TM Moveout: moving out data in WOS for proj "public.GC_last_jgm860_DBD_81_seg_jfull_b1" to epoch 8759164
    2014-04-13 09:13:52.003 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) Executing the moveout plan
    2014-04-13 09:13:52.016 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [LocalPlanner] <INFO> (b000000014f80c) Plan type: TM_MOVEOUT, Plan subtype: UNKNOWN SUBTYPE - Created new (DelId) MiniRos 49539596464025395 (Grouped: No)
    2014-04-13 09:13:52.018 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) TM Moveout: Moving out DV data in WOS up to epoch 8759165, based on WOS data up to epoch 8759164
    2014-04-13 09:13:52.018 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) TM moveout: Wos row count = 1, Wos delete row count = 1
    2014-04-13 09:13:52.018 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) Executing the moveout old replay delete plan
    2014-04-13 09:13:52.024 EEThread:0x7f7c5c36f6e0-b000000014f80c [EE] <INFO> Running ROS from sort buffer. Merge chunks = 1, merges per batch = 0
    2014-04-13 09:13:52.025 EEThread:0x7f7c5c36f6e0-b000000014f80c [EE] <INFO> Finished writing ROSes from sort buffer.
    2014-04-13 09:13:52.027 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [LocalPlanner] <INFO> (b000000014f80c) Plan type: TM_REDELETE_MOVE, Plan subtype: UNKNOWN SUBTYPE - Created new DVMiniRos <49539596464025411,49539596464025413> pointing at storage 49539596464025395
    2014-04-13 09:13:52.027 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) End of executing the moveout old replay delete plan
    2014-04-13 09:13:52.027 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) Executing DVWos moveout plans
    2014-04-13 09:13:52.027 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) getMiniROSsForDVWosMoveout: No DVWOSs to moveout
    2014-04-13 09:13:52.028 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) Deleting Wos and DV contents
    2014-04-13 09:13:52.028 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) Dropping source WOSs
    2014-04-13 09:13:52.028 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO>    49539596464025349
    2014-04-13 09:13:52.028 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> (b000000014f80c) Moved out 32768 bytes
    2014-04-13 09:13:52.030 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [EE] <INFO> Moveout projection public.GC_last_jgm860_DBD_81_seg_jfull_b1 - done
    2014-04-13 09:13:52.030 TM Moveout:0x7f7d2c01a3d0-b000000014f80c [Txn] <INFO> Starting Commit: Txn: b000000014f80c 'Moveout: (Table: public.GC_last_jgm860) (Projection: public.GC_last_jgm860_DBD_81_seg_jfull_b1)'
    2014-04-13 09:13:52.030 TM Moveout:0x7f7d2c01a3d0 [Txn] <INFO> Commit Complete: Txn: b000000014f80c at epoch 0x85a77d
    2014-04-13 09:13:52.030 TM Moveout:0x7f7d2c01a3d0 [TM] <INFO> Tuple Mover: moved out projection GC_last_jgm860_DBD_81_seg_jfull_b1
    2014-04-13 09:13:52.031 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [Txn] <INFO> Begin Txn: b000000014f80f 'Moveout: Tuple Mover'
    2014-04-13 09:13:52.032 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [TM] <INFO> Tuple Mover: moving out projection GC_last_jgm860_DBD_81_seg_jfull_b0
    2014-04-13 09:13:52.032 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Moveout projection public.GC_last_jgm860_DBD_81_seg_jfull_b0
    2014-04-13 09:13:52.032 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) TM Moveout: moving out data in WOS for proj "public.GC_last_jgm860_DBD_81_seg_jfull_b0" to epoch 8759164
    2014-04-13 09:13:52.032 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Executing the moveout plan
    2014-04-13 09:13:52.045 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [LocalPlanner] <INFO> (b000000014f80f) Plan type: TM_MOVEOUT, Plan subtype: UNKNOWN SUBTYPE - Created new (DelId) MiniRos 49539596464025415 (Grouped: No)
    2014-04-13 09:13:52.047 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [LocalPlanner] <INFO> (b000000014f80f) Plan type: TM_MOVEOUT, Plan subtype: UNKNOWN SUBTYPE - Created new (DelId) MiniRos 49539596464025431 (Grouped: No)
    2014-04-13 09:13:52.049 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) TM Moveout: Moving out DV data in WOS up to epoch 8759165, based on WOS data up to epoch 8759164
    2014-04-13 09:13:52.049 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) TM moveout: Wos row count = 2, Wos delete row count = 1
    2014-04-13 09:13:52.049 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Executing the moveout old replay delete plan
    2014-04-13 09:13:52.056 EEThread:0x7f7c5c36f8c0-b000000014f80f [EE] <INFO> Running ROS from sort buffer. Merge chunks = 1, merges per batch = 0
    2014-04-13 09:13:52.057 EEThread:0x7f7c5c36f8c0-b000000014f80f [EE] <INFO> Finished writing ROSes from sort buffer.
    2014-04-13 09:13:52.067 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [LocalPlanner] <INFO> (b000000014f80f) Plan type: TM_REDELETE_MOVE, Plan subtype: UNKNOWN SUBTYPE - Created new DVMiniRos <49539596464025447,49539596464025449> pointing at storage 49539596464025415
    2014-04-13 09:13:52.067 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) End of executing the moveout old replay delete plan
    2014-04-13 09:13:52.067 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Executing DVWos moveout plans
    2014-04-13 09:13:52.067 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) getMiniROSsForDVWosMoveout: Creating DVWos marker(s)
    2014-04-13 09:13:52.068 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Executing DVWos moveout plan over parent ROS 49539596464025309
    2014-04-13 09:13:52.072 EEThread:0x7f7c5c0c23d0-b000000014f80f [EE] <INFO> Running ROS from sort buffer. Merge chunks = 1, merges per batch = 0
    2014-04-13 09:13:52.072 EEThread:0x7f7c5c0c23d0-b000000014f80f [EE] <INFO> Finished writing ROSes from sort buffer.
    2014-04-13 09:13:52.074 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [LocalPlanner] <INFO> (b000000014f80f) Plan type: TM_DVWOS_MOVEOUT, Plan subtype: UNKNOWN SUBTYPE - Created new DVMiniRos <49539596464025451,49539596464025453> pointing at storage 49539596464025309
    2014-04-13 09:13:52.074 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Executing DVWos moveout plan over parent ROS 49539596464025325
    2014-04-13 09:13:52.078 EEThread:0x7f7c5c2cd0a0-b000000014f80f [EE] <INFO> Running ROS from sort buffer. Merge chunks = 1, merges per batch = 0
    2014-04-13 09:13:52.078 EEThread:0x7f7c5c2cd0a0-b000000014f80f [EE] <INFO> Finished writing ROSes from sort buffer.
    2014-04-13 09:13:52.080 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [LocalPlanner] <INFO> (b000000014f80f) Plan type: TM_DVWOS_MOVEOUT, Plan subtype: UNKNOWN SUBTYPE - Created new DVMiniRos <49539596464025455,49539596464025457> pointing at storage 49539596464025325
    2014-04-13 09:13:52.080 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Deleting Wos and DV contents
    2014-04-13 09:13:52.080 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Dropping source WOSs
    2014-04-13 09:13:52.080 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO>    49539596464025359
    2014-04-13 09:13:52.081 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> (b000000014f80f) Moved out 81920 bytes
    2014-04-13 09:13:52.083 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [EE] <INFO> Moveout projection public.GC_last_jgm860_DBD_81_seg_jfull_b0 - done
    2014-04-13 09:13:52.083 TM Moveout:0x7f7d2c01a3d0-b000000014f80f [Txn] <INFO> Starting Commit: Txn: b000000014f80f 'Moveout: (Table: public.GC_last_jgm860) (Projection: public.GC_last_jgm860_DBD_81_seg_jfull_b0)'
    2014-04-13 09:13:52.084 TM Moveout:0x7f7d2c01a3d0 [Txn] <INFO> Commit Complete: Txn: b000000014f80f at epoch 0x85a77d
    2014-04-13 09:13:52.084 TM Moveout:0x7f7d2c01a3d0 [TM] <INFO> Tuple Mover: moved out projection GC_last_jgm860_DBD_81_seg_jfull_b0
    2014-04-13 09:13:52.084 TM Moveout:0x7f7d2c01a3d0-b000000014f814 [Txn] <INFO> Begin Txn: b000000014f814 'Moveout: Tuple Mover'
    2014-04-13 09:13:52.084 TM Moveout:0x7f7d2c01a3d0-b000000014f814 [Txn] <INFO> Rollback Txn: b000000014f814 'Moveout: Tuple Mover'
    2014-04-13 09:13:52.085 TM Moveout:0x7f7d2c01a3d0 [Util] <INFO> Task 'TM Moveout' enabled




    vertica log on node 3:

    2014-04-13 09:15:29.757 Spread Client:0x6e46f30 [Comms] <INFO> Saw membership message 8192 on V:jasper

    2014-04-13 09:15:29.757 Spread Client:0x6e46f30 [Comms] <INFO> Saw transitional message; watch for lost daemons

    2014-04-13 09:15:29.757 Spread Client:0x6e46f30 [Comms] <INFO> Saw membership message 8192 on Vertica:all

    2014-04-13 09:15:29.757 Spread Client:0x6e46f30 [Comms] <INFO> Saw transitional message; watch for lost daemons

    2014-04-13 09:15:29.757 Spread Client:0x6e46f30 [Comms] <INFO> Saw membership message 8192 on Vertica:join

    2014-04-13 09:15:29.757 Spread Client:0x6e46f30 [Comms] <INFO> Saw transitional message; watch for lost daemons

    2014-04-13 09:15:29.763 Spread Client:0x6e46f30 [Comms] <INFO> Saw membership message 6144 on V:jasper

    2014-04-13 09:15:29.763 Spread Client:0x6e46f30 [Comms] <INFO> NETWORK change with 2 VS sets

    2014-04-13 09:15:29.763 Spread Client:0x6e46f30 [Comms] <INFO> DB Group changed

    2014-04-13 09:15:29.763 Spread Client:0x6e46f30 [VMPI] <INFO> DistCall: Set current group members called with 1 members

    2014-04-13 09:15:29.764 Spread Client:0x6e46f30 [Comms] <INFO> nodeSetNotifier: node v_jasper_node0001 left the cluster

    2014-04-13 09:15:29.764 Spread Client:0x6e46f30 [Recover] <INFO> Node left cluster, reassessing k-safety...

    2014-04-13 09:15:29.764 Spread Client:0x6e46f30 [Recover] <INFO> Checking Deps:Down bits: 001 Deps:

    011 - cnt: 186

    101 - cnt: 186

    110 - cnt: 186

    2014-04-13 09:15:29.765 Spread Client:0x6e46f30 <LOG> @v_jasper_node0003: 00000/3298: Event Posted: Event Code:3 Event Id:0 Event Severity: Critical [2] PostedTimestamp: 2014-04-13 09:15:29.764964 ExpirationTimestamp: 2082-05-01 11:29:36.764964 EventCodeDescription: Current Fault Tolerance at Critical Level ProblemDescription: Loss of node v_jasper_node0003 will cause shutdown to occur. K=1 total number of nodes=3 DatabaseName: jasper Hostname: vertica03

    2014-04-13 09:15:29.801 Spread Client:0x6e46f30 [Comms] <INFO> nodeSetNotifier: node v_jasper_node0002 left the cluster

    2014-04-13 09:15:29.801 Spread Client:0x6e46f30 [Recover] <INFO> Node left cluster, reassessing k-safety...

    2014-04-13 09:15:29.801 Spread Client:0x6e46f30 [Recover] <INFO> Cluster partitioned: 3 total nodes, 1 up nodes, 2 down nodes

    2014-04-13 09:15:29.801 Spread Client:0x6e46f30 [Recover] <INFO> Setting node v_jasper_node0003 to UNSAFE



  • UPDATE : All problems were solved installing the correct  vmwaretool for the virtual NICs
  • Hi Massimo,

    could you provide more details about your installation? We are aware of the constant node state change messages, but this issue was addressed in the latest Vertica release. It'd be helpful to know how you were able to resolve this issue in more detail. Thanks!

    -Bohyun
  • Hi Bohyun,

    My Vertica Community Edition 7.0.1 was installed on 3 Centos 6.5 x64,

    adding vmware-tools  to the repository and I installed
    this components replacing the default
    Centos VM NIC drivers.


    vmware-tools-core.x86_64                         9.0.13-1.el6                         vmware-tools
    vmware-tools-esx.x86_64                          9.0.13-1.el6                         vmware-tools
    vmware-tools-esx-kmods.x86_64                    9.0.13-1.el6                         vmware-tools
    vmware-tools-esx-nox.x86_64                      9.0.13-1.el6                         vmware-tools
    vmware-tools-foundation.x86_64                   9.0.13-1.el6                         vmware-tools
    vmware-tools-guestlib.x86_64                     9.0.13-1.el6                         vmware-tools
    vmware-tools-guestsdk.x86_64                     9.0.13-1.el6                         vmware-tools
    vmware-tools-hgfs.x86_64                         9.0.13-1.el6                         vmware-tools
    vmware-tools-libraries-nox.x86_64                9.0.13-1.el6                         vmware-tools
    vmware-tools-libraries-x.x86_64                  9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-autoUpgrade.x86_64          9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-deployPkg.x86_64            9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-desktopEvents.x86_64        9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-dndcp.x86_64                9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-guestInfo.x86_64            9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-hgfsServer.x86_64           9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-powerOps.x86_64             9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-resolutionSet.x86_64        9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-timeSync.x86_64             9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-unity.x86_64                9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-vix.x86_64                  9.0.13-1.el6                         vmware-tools
    vmware-tools-plugins-vmbackup.x86_64             9.0.13-1.el6                         vmware-tools
    vmware-tools-pvscsi-common.x86_64                9.0.13-5.el6                         vmware-tools
    vmware-tools-services.x86_64                     9.0.13-1.el6                         vmware-tools
    vmware-tools-thinprint.x86_64                    9.0.13-1.el6                         vmware-tools
    vmware-tools-user.x86_64                         9.0.13-1.el6                         vmware-tools
    vmware-tools-vmblock-common.x86_64               9.0.13-5.el6                         vmware-tools
    vmware-tools-vmci-common.x86_64                  9.0.13-5.el6                         vmware-tools
    vmware-tools-vmhgfs-common.x86_64                9.0.13-5.el6                         vmware-tools
    vmware-tools-vmmemctl-common.x86_64              9.0.13-5.el6                         vmware-tools
    vmware-tools-vmxnet-common.x86_64                9.0.13-5.el6                         vmware-tools
    vmware-tools-vmxnet3-common.x86_64               9.0.13-5.el6                         vmware-tools
    vmware-tools-vsock-common.x86_64                 9.0.13-5.el6                         vmware-tools


    So before this activity Vertica Host shutdown often, after this setup
    we have only often node state change but no more shutdows ...

    I hope it is useful

    Ciao, Massimo

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file