Spread Retransmit Rate
Have upgraded to 7.1.0 both Vertica and the Management Console. There's a new graph in the MC, Spread Retransmit Rate, which every minute fluctuates from 0 to 50% - is this within the expected range or could there be an issue here?
It's a 4 node cluster and every node shows roughly the same characteristics.
Cheers,
Mark
It's a 4 node cluster and every node shows roughly the same characteristics.
Cheers,
Mark
0
Comments
I believe, though someone here should confirm, that this is indicating how much Spread is having to retransmit its requests due to dropped packets.
If your network is in good working order, and is not running at or near maximum capacity, you should not be seeing dropped packets at all. So this can indicate a network configuration or congestion issue.
Note that Spread uses UDP (so that it can take advantage of UDP Broadcast -- there's no good equivalent way to broadcast data to many nodes simultaneously over TCP). A common issue with Spread is that some switches will give UDP packets a lower priority than TCP, as a heuristic since they are less important in many systems. But in Vertica, Spread is our control layer; that's our most-important traffic.
If this becomes a real problem, the symptoms are:
- Nodes dropping out of the network unpredictably, then re-joining
- Long latencies in things like starting new Vertica sessions, COMMIT, etc -- anything that requires cluster-wide operation.
If you're not seeing those symptoms, then there's no need to act now; but keep this in mind as a possible cause if you start seeing problems down the road.
If you are seeing these symptoms, depending on the specifics of the problem, the solution is generally to either adjust your switch's configuration to prioritize UDP, or to give Vertica its own IP subnet (so other broadcast traffic on your network doesn't interfere with Vertica), or (if all else fails) to give Spread a dedicated VLAN or physical network separate from Vertica's data layer so that big node<->node resegment operations don't interfere with the control messages that we send over Spread.
Adam
Thanks for the reply, very helpful, will work with the network guys and focus on the areas you mention. I'm not seeing any negative symptoms at the moment, but it is a new cluster and not fully live yet. Spread is setup to use private interconnects between the nodes on its own dedicated VLAN.
Mark