Error: failed to open connection to node

We get this error while trying to run a query that joins few objects in v_catalog and an object in v_monitor.

 

"[Vertica][VJDBC](4054) ERROR: NetworkSend on <node name>: failed to open connection to node <Node Name>(Transport channel closed)"

 

Here is the list of objects involved in the query.

 

v_catalog.tables

v_monitor.projection_storage

v_catalog.projections

v_catalog.tables

 

 

 

It runs fine if we rerun this query after it fails, could not find a reason why the query fails. Any suggestion would be greatly appreciated. 

Comments

  • Hi Guna,

     

    You said that the error goes away when you retry. When does it come back? Does it only happen some of the time? If so, how frequently? 1%, 10%, 50%?

     

    The error indicates that Vertica's node-to-node communication has failed. Network connections are vulnerable to environmental interferance (i.e. network hiccups) and is more likely in a virtualized environment, as they usually have less reliable networking between the nodes.

     

    node-to-node communication failures typically cause the query to be internally retried. Vertica will retry a limited number of times before showing the error to the user.

     

    Derrick

  • Thank you for the information DerrickR. This is a VM environment and this error occurs rarely (< 1%). Will take it to the notice of Sys. Admin. and request to monitor the network connectivity between the nodes.

  • Guna,

     

    Glad to be of help. Keep in mind the scaling function in this case...

     

    Vertica makes a connection between every pair of nodes. It must, because data from any node may need to go to any other node for execution.

     

    If you have N nodes, then you have N-Squared total connections. As N increases, the number of connections grows rather quickly.

     

    If you are observing a 1% failure rate, that indicates something like 0.01/(N^2) failure rate on your network. So your sys admin may claim that things are very healthy, but even a very low percentage of failure can cause the errors you are seeing.

     

    A stable and reliable node-to-node network is an assumption of Vertica's current architecture. Vertica is specifically not well suited to a Wide-Area-Network or Globally Distributed cluster layout.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file