Error connecting pt-pt socket for send

When running multi-node Vertica clusters with certain versions of the Linux kernel, (list below) queries intermittently failed with the error message: "Error connecting pt-pt socket for send: Cannot assign requested address" This issue appears to be caused by an issue in Red Hat Enterprise Linux (RHEL) 5 and 6, as well as in a number of security patches issued by Red Hat. Vertica is working with Red Hat to identify and help resolve this issue. This problem has been observed with Vertica 5.0, 5.1 and Vertica 6.x when running on specific versions of the Linux kernel (including the default kernels distributed in Red Had Enterprise Linux 6 and derivative (e.g. CentOS 6) distributions.)

Comments

  • In these cases, when one Vertica node calls the 'connect' system call to open a TCP socket to another Vertica node, the kernel sometimes returns EADDRNOTAVAIL ('Can not assign requested address'). Each query plan opens and closes many TCP connections for each query. Resolution: Hotfix 6.0.1-4 introduces logic that specifies that, when the kernel refuses to open a connection and returns EADDRNOTAVAIL, Vertica retries the same connect request. Notes: **Kernel versions known to have exhibited problem: 2.6.32-279.11.1.el6.x86_64 2.6.32-279.11.1.el6 2.6.32-279.2.1.el6.x86_64 2.6.32-279.5.1.el6.x86_64 2.6.32-279.5.2.el6.x86_64 2.6.32-276.el6.x86_64 2.6.32-262.el6.x86_64 2.6.32-220.23.1.el6 **Kernel versions that have fixed the problem: When customers have moved to the following kernel versions, the problem is resolved: 3.0.30-rt50.62.el6rt.x86_64 2.6.32-220.2.1.el6 In addition, security patches introduced by Red Hat have also introduced the problem. Therefore, the list above is not exhaustive.
  • I would recommend that you use the 6.0SP2 (6.0.2) version that is available on the web site. This version has a workaround for the RedHat issue that you are encountering. In the 6.0.1-4 hot fix the code was change to simply retry the query again when this error occurred. It was not a fix. In 6.0.2 we actually changed our code to work around a Red Hat issue that we encountered. This is also resolved in the 6.1.1 version and later. Amy
  • awongawong Registered User
    Hi,

    Any idea about this error.  I'm using spread point to point connection and get this error every once in a while.
    [Vertica][VJDBC](4054) ERROR: NetworkSend on v_dw_node0001: failed to open connection to node v_dw_node0005 (socket error: Connection timed out)

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file