rsync error: error in socket IO (code 10)
Hi,
I am trying to backup my database using vbr.py script onto remote servers. I set up passwordless ssh connection for dbadmin user between the production server and backup server and ssh works fine, but the backup is failing with the following error
rsync: failed to connect to 15.224.232.169: Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7]
rsync failed!
53961: vbr client subproc on 1.1.54.3 terminates with returncode 1. Details in vbr_v_verprd1_node0002_client.log on that host.
rsync: failed to connect to 15.224.232.168: Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7]
rsync failed!
Child processes terminated abnormally.
backup failed!
Some of the files are being copied and it suddenly terminates. Can you please help me out on why this is happening and whats the fix for it.
Thanks
Saumya
I am trying to backup my database using vbr.py script onto remote servers. I set up passwordless ssh connection for dbadmin user between the production server and backup server and ssh works fine, but the backup is failing with the following error
rsync: failed to connect to 15.224.232.169: Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7]
rsync failed!
53961: vbr client subproc on 1.1.54.3 terminates with returncode 1. Details in vbr_v_verprd1_node0002_client.log on that host.
rsync: failed to connect to 15.224.232.168: Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7]
rsync failed!
Child processes terminated abnormally.
backup failed!
Some of the files are being copied and it suddenly terminates. Can you please help me out on why this is happening and whats the fix for it.
Thanks
Saumya
0
Comments
Please verify that you have password less ssh access to all nodes in the cluster and also password less ssh access to self node.
Missing password less ssh to self node is often the problem.
Missing password less ssh on one of the nodes could be causing this error.
The vbr.py uses an rsync port of 50000 check that this port is open
Rsync by default uses port 873
To test open for open ports:
1. On each node including node 1 make it a listening node for port <number> activity using
nc -l 873
Each node will go into listen mode and wait for remote input across the port.
2. On another session on node 1 make it a sending node and send a message over port <number>
nc nodename 873
3. Where nodename is the hostname or IP Address of node 1, it will go into send mode waiting for you to input keystrokes.
Entering text and hitting return should push the text across the port and display on the listening machine.
CTRL C gets you out.
Repeat for each node to ensure the port on all nodes from the initiator node are open.
Other Vertica Ports
Vertica
5433 TCP (All connections)
Spread
4803 TCP (Client connections)
4803 UDP (Daemon <-> Daemon)
4804 UDP (Daemon <-> Daemon)
4805 UDP (Monitor to Daemon) (optional and only if "DangerousMonitor = yes" in config file)
Regards'
Abhishek
The ssh connection passwordless is fine from all the production servers to the backup remote hosts.... i do see that the backup directory on the remote host is created and some files copied.
However, the ssh passwordless from the backup host to production server is not working.. can that be an issue? is that required or if we just have from production to backup server that is enough?
Thanks
Saumya
The ssh is setup between both the servers and the ports are opened too now but still it fails with the same issue.
Please let me know what is wrong here.
Thanks
saumya
############
rsync manually
############
[dbadmin@msast001pvdb01 config]$ rsync -avr --rsh=/usr/bin/ssh /opt/vertica/config/prod_backup_remote.ini VERRWCSTDB01:/opt/vertica/config/
sending incremental file list
prod_backup_remote.ini
sent 666 bytes received 31 bytes 464.67 bytes/sec
total size is 557 speedup is 0.80
############
Error during backup:
############
rsync: failed to connect to 192.168.201.202: Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7]
rsync failed!
Child processes terminated abnormally.
how do i do that?
Thanks
Saumya
I tried with both the ports 873 and 50000. When using 873 I get the error message as connection refused and when using 50000 it gives error of connection time out. Please let me know how to fix this. We do not have backups since a long time and really need this running.
## when using 873
rsync: failed to connect to 192.168.201.204: Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7]
rsync failed!
[root@VERRWCSTDB01 ~]# cat /tmp/vbr/vbr_rsyncd.log
2014/04/01 11:44:19 [28758] rsyncd version 3.0.7 starting, listening on port 873
2014/04/01 11:44:19 [28758] bind() failed: Permission denied (address-family 2)
2014/04/01 11:44:19 [28758] socket(10,1,6) failed: Address family not supported by protocol
2014/04/01 11:44:19 [28758] unable to bind any inbound sockets on port 873
2014/04/01 11:44:19 [28758] rsync error: error in socket IO (code 10) at socket.c(541) [Receiver=3.0.7]
2014/04/01 11:48:46 [29388] rsyncd version 3.0.7 starting, listening on port 873
2014/04/01 11:48:46 [29388] bind() failed: Permission denied (address-family 2)
2014/04/01 11:48:46 [29388] socket(10,1,6) failed: Address family not supported by protocol
2014/04/01 11:48:46 [29388] unable to bind any inbound sockets on port 873
2014/04/01 11:48:46 [29388] rsync error: error in socket IO (code 10) at socket.c(541) [Receiver=3.0.7]
### When using 50000
rsync: failed to connect to 192.168.201.202: Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7]
rsync failed!
Thanks
Saumya
[dbadmin@msast001pvdb01 ~]$ ps -ef |grep rsync
dbadmin 11250 1 0 Jan02 ? 00:00:00 /opt/vertica/bin/rsync --daemon --config=/tmp/vbr/vbr_rsyncd.conf --port=50000
[dbadmin@msast001pvdb01 ~]$ /opt/vertica/bin/rsync -avz /data/backups dbadmin@VERRWCSTDB01:/data/backups
sending incremental file list
backups/
backups/v_verprd1_node0001/
backups/v_verprd1_node0001/.production_backup_test.done/
backups/v_verprd1_node0001/production_backup_test/
backups/v_verprd1_node0001/production_backup_test/production_backup_test.info
backups/v_verprd1_node0001/production_backup_test/production_backup_test.txt
backups/v_verprd1_node0001/production_backup_test/catalog/
backups/v_verprd1_node0001/production_backup_test/catalog/VERPRD1/
backups/v_verprd1_node0001/production_backup_test/catalog/VERPRD1/v_verprd1_node0001_catalog/
backups/v_verprd1_node0001/production_backup_test/catalog/VERPRD1/v_verprd1_node0001_catalog/vertica.conf
backups/v_verprd1_node0001/production_backup_test/catalog/VERPRD1/v_verprd1_node0001_catalog/Snapshots/
backups/v_verprd1_node0001/production_backup_test/catalog/VERPRD1/v_verprd1_node0001_catalog/Snapshots/catalog.ctlg
backups/v_verprd1_node0001/production_backup_test/data/
backups/v_verprd1_node0001/production_backup_test/data/VERPRD1/
backups/v_verprd1_node0001/production_backup_test/data/VERPRD1/v_verprd1_node0001_data/
Yes, I checked the netstat on 50000 port when its running the backup using vbr.py and see that its opened as i do see the message below in the log
[root@VERRWCSTDB01 vbr]# tail -f vbr_rsyncd.log
2014/04/01 15:01:21 [20252] rsyncd version 3.0.7 starting, listening on port 50000
And while the backup is running i do see that the port 873 is being used by rsync on all the nodes so that is open too
egrep rsync /etc/services
rsync 873/tcp # rsync
rsync 873/udp # rsync
airsync 2175/tcp # Microsoft Desktop AirSync Protocol
airsync 2175/udp # Microsoft Desktop AirSync Protocol
Thanks
Saumya
I tried with two ports 50001 and 50002 and its the same error
rsync: failed to connect to 192.168.201.202: Connection timed out (110)
rsync error: error in socket IO (code 10) at clientserver.c(122) [sender=3.0.7]
rsync failed!
Child processes terminated abnormally.
backup failed!
On production host:
[dbadmin@msast001pvdb01 config]$ cat //tmp/vbr/vbr26212.log
2014-04-02 09:33:27 Helper cancelling process entry
2014-04-02 09:33:27 ps aux | grep /tmp/vbr/vbr.py | grep -v grep | grep client
[dbadmin@msast001pvdb01 config]$ cat //tmp/vbr/vbr_v_verprd1_node0001_client.log
2014-04-02 09:26:52 Transfer client process entry: my pid is 13506; task is backup.
2014-04-02 09:26:52 Read lock acquired on .ctlg file
2014-04-02 09:26:52 linking/copying special files at client: .ctlg, .txt, .conf
2014-04-02 09:26:53 Dry-run to find transfer size
2014-04-02 09:27:56 rsync failed with code 10
2014-04-02 09:27:56 rsync failed!
2014-04-02 09:32:23 Transfer client process entry: my pid is 23333; task is backup.
2014-04-02 09:32:23 Read lock acquired on .ctlg file
2014-04-02 09:32:23 linking/copying special files at client: .ctlg, .txt, .conf
2014-04-02 09:32:24 Dry-run to find transfer size
2014-04-02 09:33:27 rsync failed with code 10
2014-04-02 09:33:27 rsync failed!
[dbadmin@msast001pvdb01 config]$
On remote host:
[root@VERRWCSTDB01 vbr]# cat vbr_v_verprd1_node0001_server.log
2014-04-02 09:27:00 Transfer Server process entry: my pid is 17560; task is backup.
2014-04-02 09:27:00 Acquiring remoteServer mutex
2014-04-02 09:27:00 Acquired remoteServer mutex
2014-04-02 09:27:00 ps aux | grep rsync | grep -v grep | grep daemon | grep port=50001
2014-04-02 09:27:00
2014-04-02 09:27:00 Rsync daemon is now running
2014-04-02 09:27:00 Released mutex
2014-04-02 09:27:00 Transfer Server process exit
2014-04-02 09:32:31 Transfer Server process entry: my pid is 18329; task is backup.
2014-04-02 09:32:31 Acquiring remoteServer mutex
2014-04-02 09:32:31 Acquired remoteServer mutex
2014-04-02 09:32:31 ps aux | grep rsync | grep -v grep | grep daemon | grep port=50001
2014-04-02 09:32:31 dbadmin 17578 0.0 0.0 107680 656 ? Ss 09:27 0:00 /opt/vertica/bin/rsync --daemon --config=/tmp/vbr/vbr_rsyncd.conf --port=50001
2014-04-02 09:32:31 Rsync daemon is already running
2014-04-02 09:32:31 Released mutex
2014-04-02 09:32:31 Transfer Server process exit
[root@VERRWCSTDB01 vbr]# cat vbr_rsyncd.log
2014/04/02 09:27:00 [17578] rsyncd version 3.0.7 starting, listening on port 50001
Thanks
saumya
- Your rsync isnt running the minimum required version for Vertica.
- You can do passwordless ssh back and forth using root and dbadmin
- No rsync daemons are running when you try to perform the backup.
I shall post more ideas if I can think of something. else.
The answer is yes for all. I verified all those again and they look fine.
Thanks
saumya
Its true for all the nodes that I am running it on.
Thanks
Saumya
Is there any update on this issue. is there anything else that I am missing?
Thanks
Saumya
Now its running fine.
Thanks
Saumya
We are facing same issue. Can you please let me know what changes you made in firewall to make it work?
Thanks!!
I am having the same issue, do you remember the firewall changes.
I know this is an old post, but I had the same issue, and wanted to post what I did to remediate. I had two separate clusters in two separate datacenters and was attempting to use the vbr script to replicate some objects from one cluster to the other. The source cluster is running off dedicated servers, while the target cluster was spun up in AWS. I was getting the same connection time out error, and troubleshooted with the vbr logs (/tmp/vbr).
The fix was to add firewall rules to my target AWS cluster to allow TCP connections over port 50000 from my source cluster. Hope this helps anyone who stumbles across this.