SQL statements fail with "socket error: Resource temporarily unavailable"

In the database logs (Vertica log), we see these errors: Messenger::connectcb: socket error 110: Resource temporarily unavailable NetworkSendConsumer: open failed to node: v_db_node0001 (Detail: socket error: Resource temporarily unavailable) (tag: 1000 plan: 29128468250610) @v_db_node0002: V1004: NetworkSend: failed to open connection to node v_db_node0001 (socket error: Resource temporarily unavailable) Network Interface Cards report high dropped packet counts in ifconfig

Comments

  • We have seen this error for the combination of Linux 2.6 kernels and either Broadcom bnx2 drivers or Emulex NICs. Detail (two different NIC cards involved): A. This behavior is found only on Broadcom network cards running on kernel 2.6. The problem is not reproducible by predefined steps. By default MSI (Message Signaled Interrupts) is enabled on kernel 2.6 and it?s not supported on 2.4. That causes an intermittent network drop with Broadcom cards. Disabling MSI on Broadcom bnx2 module resolves this problem. Procedure: Unload the bnx2 driver then # modprobe bnx2 disable_msi=1 and test, if that works then: edit/create modprobe.d/bnx2.conf add: options bnx2 disable_msi=1 for the permanent setting Ubuntu forums article: http://ubuntuforums.org/archive/index.php/t-1726045.html To troubleshoot this issue: 1. Turn on debug logging - select set_debug_log('PROTOCOL','ALL;); select set_debug_log('VMPI','ALL'); 2. gather /etc/hosts from all machines. 3. gather Firewall rules from all machines (iptables -L). 4. in 5.1 export requires that private interfaces from one cluster be able to route to private interfaces of the other cluster and vice versa (if there are separate private networks). B. A similar symptom may occur Emulex fiber NICs, and Linux releases starting with RHEL5.5. In this case, there are two possible solutions. In kernel logs (dmesg, /var/log/messages) you will find: do_IRQ: 9.191 No irq handler for vector (irq-1) This problem appears to be similar in that it looks to be an interrupt issue. We found this article that imposes the Emulex version of Broadcom fix (disable_msi). https://bugzilla.redhat.com/show_bug.cgi?id=592018 "The issue is interrupt related. The lpfc driver in RHEL 5.5 changed the default behavior of the driver to not use MSI or MSI-X. Setting the module parameter lpfc_use_msi to 2 (the default value in 5.4) fixes the timeouts and restores connectivity to storage. With lpfc_use_msi set to 2, the emulex adapters use MSI interrupts and work properly." Solution 1: Follow above procedure (section A) to disable msi for the Emulex NICs. Solution 2: change the parameter lpfc_use_msi to 2, as in above redhat bug report.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file