vbr.py-Based Backups From Multiple Nodes To Single Server Partially Fail
Problem:
We are attempting to use vbr.py to backup our 18 node Vertica cluster to a single backup server. The backups partially succeed:
There is sufficient storage on the backup server to hold all Vertica files from the cluster.
We are attempting to use vbr.py to backup our 18 node Vertica cluster to a single backup server. The backups partially succeed:
There is sufficient storage on the backup server to hold all Vertica files from the cluster.
Only two of the directories in the backup location have the expected amount of data.
Details:
During the backup, it returns the error message
[dbadmin@backup1]$ vbr.py --task backup --config-file primary_backup.ini
Found Database port: 5433
Copying...
27839: vbr server subproc on 10.xx.xx.xx terminates with returncode 255. Details in vbr_v_test_node0010_server.log on that host.
Error msg: ssh_exchange_identification: Connection closed by remote host
Solution:
Details:
During the backup, it returns the error message
[dbadmin@backup1]$ vbr.py --task backup --config-file primary_backup.ini
Found Database port: 5433
Copying...
27839: vbr server subproc on 10.xx.xx.xx terminates with returncode 255. Details in vbr_v_test_node0010_server.log on that host.
Error msg: ssh_exchange_identification: Connection closed by remote host
Solution:
Increase the number of concurrent connections ssh allows on the backup host.
root@host # grep MaxStartups /etc/ssh/sshd_config
# Old Style
MaxStartups 12
# New Style
MaxStartups 10:30:60
MaxStartups
Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon.
Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection. The default is 10.
Alternatively, random early drop can be enabled by specifying the three colon separated values ''start:rate:full'' (e.g. "10:30:60"). sshd will refuse connection attempts with a probability of ''rate/100'' (30%) if there are currently ''start'' (10) unauthenticated connections. The probability increases linearly and all connection attempts are refused if the number of unauthenticated connections reaches ''full'' (60).
LoginGraceTime
The server disconnects after this time if the user has not successfully logged in. If the value is 0, there isno time limit. The default is 120 seconds.
If you want to support 50 connection at a time and no connectioning failing then set in MaxStartups with 50 and no colons.
MaxStartups 50
0
Comments
Is there any thing else that was done apart from the maxstartups increase to fix this?
Thanks
Saumya