Options

Node goes down frequently

Navin_CNavin_C Vertica Customer
Hello all,
I am experiencing issues with nodes.

I find one of my node goes down very frequently. I tried to check the vertica.log file and found this when it got down

2014-03-18 18:13:22.298 Spread Client:0x7fb58f0 [Comms] <INFO> nodeSetNotifier: node node0005 left the cluster
2014-03-18 18:13:22.299 Spread Client:0x7fb58f0 [Comms] <INFO> Removing #node_b#N010102010107->node0005 from processToNode and other maps due to departure from Vertica:all

And from here onwards 
I see "Received no response from node0005 in transactions*

I see there is some issue with Spread 
What might be the reason behind this. 

Comments

  • Options
    Hi Navin,

    Spread issues are often a result of network issues.  Specifically, Spread sends a lot of UDP messages.  If they don't go through, Spread can't keep the node synchronized with the cluster, so drops it from the cluster.

    What does your network topology look like?  Is node 5 on a different rack?, behind a router?, etc.  Are you seeing dropped packets on the machine?  Does Vertica's network-test tool report any issues?  Are there any other indications of a flaky network?

    We occasionally see this in clusters with very high traffic (lots of resegments, etc).  In case of high traffic volumes, some switches are configured to prioritize TCP over UDP.  In which case the cluster's data traffic can starve its control-message traffic.  The workaround is to re-configure the switch to treat UDP fairly, or to put Spread traffic on a different network.  (Different physical network is ideal, but a separate VLAN on the same physical network is usually sufficient, assuming you can get VLANs to work properly with UDP Broadcast on your hardware.)

    Adam
  • Options
    Navin_CNavin_C Vertica Customer
    Thanks for that Adam,

    Very detailed reply. 
    I will try to followup with my hardware team to see if we can try this workaround.

    Other then that:
    1. All the nodes are in the same rack
    2. How to check if there are dropped packets in the network
    3. How to run Vertica network test tool after vertica is installed (netperf ?)

    Thanks

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file