Moved Dev cluster to another location, rsync daemon launch fails on copycluster/restore
We recently moved our dev cluster from our local data center to an off-site location. I went through the re-IP process once that cluster was set up in the new data center, and Vertica itself is running fine. However, our weekly Copy Clusters from our local data center to dev are now failing. Based on the log files generated, it seems that vbr.py can no longer remotely start the rsync daemon on the remote nodes. SSH key files have been updated, and both clusters can see each other with no problem. I can ssh without password to any node from/to either cluster with no problem. I can manually launch the rsync daemon remotely from/to either cluster. But vbr.py fails doing so.
In our production cluster, the node_address field in the Nodes table is set to the private IP address of the nodes, and the export_address is the public IPs. On the Dev cluster, both are set to the public IPs. When both systems were on the same network here, it didnt' matter, they could see each other on either IP. Obviously now, the private IPs of the prod cluster are no longer visible to dev. My question is, do Restore and Copy Cluster rely on the local IPs when starting rsync? When prod tries to start up on dev, it properly pulls the dev IPs, but rsync fails due to a timeout of the rsyncd launcher. I wouldn't imagine the IPs of the source nodes matter in that case, because those aren't being passed to the target nodes, rsync is (or should be) being launched from the command line, only the destination location would matter.
Hoping someone is familiar enough with this issue to reply.
Thanks!
Joe G
Comments
7.2.3-12
For posterity, I believe this went through official support and it was determined that there was an MTU mismatch in the network. Once switch settings were corrected, copycluster complete normally.