Node goes down frequently
Navin_C
Vertica Customer ✭
Hello all,
I am experiencing issues with nodes.
I find one of my node goes down very frequently. I tried to check the vertica.log file and found this when it got down
2014-03-18 18:13:22.298 Spread Client:0x7fb58f0 [Comms] <INFO> nodeSetNotifier: node node0005 left the cluster
2014-03-18 18:13:22.299 Spread Client:0x7fb58f0 [Comms] <INFO> Removing #node_b#N010102010107->node0005 from processToNode and other maps due to departure from Vertica:all
And from here onwards
I see "Received no response from node0005 in transactions*
I see there is some issue with Spread
What might be the reason behind this.
I am experiencing issues with nodes.
I find one of my node goes down very frequently. I tried to check the vertica.log file and found this when it got down
2014-03-18 18:13:22.298 Spread Client:0x7fb58f0 [Comms] <INFO> nodeSetNotifier: node node0005 left the cluster
2014-03-18 18:13:22.299 Spread Client:0x7fb58f0 [Comms] <INFO> Removing #node_b#N010102010107->node0005 from processToNode and other maps due to departure from Vertica:all
And from here onwards
I see "Received no response from node0005 in transactions*
I see there is some issue with Spread
What might be the reason behind this.
0
Comments
Spread issues are often a result of network issues. Specifically, Spread sends a lot of UDP messages. If they don't go through, Spread can't keep the node synchronized with the cluster, so drops it from the cluster.
What does your network topology look like? Is node 5 on a different rack?, behind a router?, etc. Are you seeing dropped packets on the machine? Does Vertica's network-test tool report any issues? Are there any other indications of a flaky network?
We occasionally see this in clusters with very high traffic (lots of resegments, etc). In case of high traffic volumes, some switches are configured to prioritize TCP over UDP. In which case the cluster's data traffic can starve its control-message traffic. The workaround is to re-configure the switch to treat UDP fairly, or to put Spread traffic on a different network. (Different physical network is ideal, but a separate VLAN on the same physical network is usually sufficient, assuming you can get VLANs to work properly with UDP Broadcast on your hardware.)
Adam
Very detailed reply.
I will try to followup with my hardware team to see if we can try this workaround.
Other then that:
1. All the nodes are in the same rack
2. How to check if there are dropped packets in the network
3. How to run Vertica network test tool after vertica is installed (netperf ?)
Thanks