[Vertica][VJDBC](4236) ROLLBACK: One or more nodes did not open a data connection to this node.
Hi,
I have 3 nodes cluster DB (v9.2.0) and yesterday i found there is NTP issue.
i fixed times manually (there was 1-10 minutes time shift between nodes),
few minutes later i got most queries failed.
now:
- Each node can ping others (same as before)
- There is no networking issue (vnetperf result is fine and spread is ok!)
- All nodes are UP (i stop/start recently but it not help)
- v_monitor.error_messages has too many error such as:
One or more nodes did not open a data connection to this node. This may indicate a network configuration problem. Check that the private interfaces used for communication among the cluster hosts reside in the same subnet and are returned first by host address lookup
Recceive on v_db_node0001: query has been canceled
RecvFiles on v_db_node0002: Open failed on node [v_db_node0003] ()
DataTargetProxy on v_db_node0002: handle is canceled
Please let me know what's doing wrong?
Best Answer
-
mosheg Vertica Employee Administrator
Use a script to save the output of a set of diagnostic commands, as a base line,
To represent a good system state when everything works as expected.
When something goes wrong, run again the same set of commands and compare with the good base line / log files.CentOS examples (for each node in the cluster):
- View the IP Address, MAC address and MTU (Maximum transmission unit) size --> ifconfig
- Display connection info and routing table information --> route or netstat -r
- View, setting speed and duplex of your Network Interface Card (Do for each NIC) --> ethtool eth0
- Show the status of each network interface --> ip addr
- Check the network configuration files for each network interface --> cat /etc/sysconfig/network-scripts/ifcfg*
- Check DNS records, by pinging from/to all nodes domain name --> ping nodex.YourDomain and cat /etc/resolv.conf
- Another DNS check --> nslookup www.tecmint.com
- Check connectivity from both ways on all nodes to/from private and public network --> ping each_node
- Trace the route an IP packet would follow to/from all your cluster nodes --> traceroute node01
- List all Firewall rules --> iptables -L
- Check Vertica environment requirements; See more options with --help
As root: /opt/vertica/oss/python3/bin/python3 -m vertica.local_verify
5
Answers
Is SSL enabled? Clock drift will cause issues with SSL and may require reboot of all hosts to synchronize clocks.
Were there any messages in vertica.log on the initiator node, or messages in dmesg or /var/log/messages on any node indicating a system issue?
It's recommended to open a support case if you can.
Thanks Bryan,
No, SSL is not enabled,
After 2 days i found the issue. as mentioned in error log it was a networking issue. But i don't know what exactly it was!
I deleted all routes on nodes and redefined all of them.
Now I'm curious about it, because as least ping was OK and also spread (traffic) was working!!
How to t-shoot it next time?