Backup fails: Error msg: Host key verification failed.
When trying run a backup to properly setup remote backup host on a 3 node cluster, I get the following error. I can passwordless ssh in with no problem to the other cluster, however backup always fails with the below:
vbr.py --task backup --config-file backup2.ini
Copying...
1803: vbr client subproc on 10.0.0.106 terminates with returncode 1. Details in vbr_v_infoscout_node0002_client.log on that host.
Error msg: Host key verification failed.
Host key verification failed.
Traceback (most recent call last):
File "/tmp/vbr/vbr.py", line 2731, in work
remoteClient(args[0], args[1], args[2], args[3], args[4], args[5], args[6] == 'True')
File "/tmp/vbr/vbr.py", line 919, in remoteClient
ssList = subprocess.check_output(g["sshBackup"] + [sHost, cmd])
File "/opt/vertica/oss/python/lib/python2.7/subprocess.py", line 537, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['ssh', '-x', '10.0.0.249', 'ls -1 /data/infoscout_backup/v_infoscout_node0002']' returned non-zero exit status 255
Child processes terminated abnormally.
backup failed!
cleaning up...
1802: vbr client subproc on 10.0.0.105 terminates with returncode 2. Details in vbr_v_infoscout_node0001_client.log on that host.
Error msg: cancelled by SIGINT
1805: vbr client subproc on 10.0.0.107 terminates with returncode 255. Details in vbr_v_infoscout_node0003_client.log on that host.
Error msg: Killed by signal 2.
Retrying... #1
ERROR 4153: Node: v_infoscout_node0003: Cannot grab lock to create snapshot 'full_cluster_backup'. It might be used by others
When communicating with vertica, the process failed with code 1
backup failed!
Retrying... #2
ERROR 4153: Node: v_infoscout_node0003: Cannot grab lock to create snapshot 'full_cluster_backup'. It might be used by others
When communicating with vertica, the process failed with code 1
backup failed!
Comments
Thank you!
Thank you!
The "returned non-zero exit status 255" error message typically indicates that although passwordless ssh is configured, all nodes do not have all the other nodes in their known hosts file. So you may have to go to each host in the cluster and do "ssh hostname" to all hosts including the one you are on and answer "yes" to the prompt to add to known hosts file if given.
Please try this and let us know the results at your earliest convenience.
Thanks,
Rory
What is the node with IP 10.0.0.249?
run the below command from there?
ssh -x 10.0.0.249
i.e run ssh command from the same node to the same node and accept 'yes'
Any help?
Please confirm that the known_hosts file on each node includes each node in the cluster including the node you're on. To do so:
cat ~/.ssh/known_hosts
So, in a 3 node cluster, the known_hosts file on 10.5.5.10 should include:
10.5.5.10
10.5.5.11
10.5.5.12
Also, please send in the latest error message you're getting after ensuring that SSH has been enabled.
Thanks,
Rory
I am using back server also my primary vertica server .
snapshotName = testschemabkupverticaConfig = True
restorePointLimit = 2
objects = xxxxx
[Database]
dbName =
dbUser =
dbPassword =
[Transmission]
[Mapping]
v_xxxxx_node0001 =d56uz:/users/home/dbadmin/
v_xxxxx_node0002 = d56uz:/users/home/dbadmin/
v_xxxxx_node0003 = d56uz:/users/home/dbadmin/
v_vx_node0004 = d56uz:/users/home/dbadmin/
me too got the same error. yes i can able to see my server ip addresses cat ~/.ssh/known_hosts .
could you please help in this regard
if there are unwanted snapshots, you may remove them by the below command and then re-try.
select remove_database_snapshot('full_cluster_backup')