Backup fails: Error msg: Host key verification failed.

Hello,

When trying run a backup to properly setup remote backup host on a 3 node cluster, I get the following error. I can passwordless ssh in with no problem to the other cluster, however backup always fails with the below:

vbr.py --task backup --config-file backup2.ini 


Copying... 

1803: vbr client subproc on 10.0.0.106 terminates with returncode 1. Details in vbr_v_infoscout_node0002_client.log on that host. 

Error msg: Host key verification failed.

Host key verification failed.

Traceback (most recent call last):

  File "/tmp/vbr/vbr.py", line 2731, in work

    remoteClient(args[0], args[1], args[2], args[3], args[4], args[5], args[6] == 'True')

  File "/tmp/vbr/vbr.py", line 919, in remoteClient

    ssList = subprocess.check_output(g["sshBackup"] + [sHost, cmd])

  File "/opt/vertica/oss/python/lib/python2.7/subprocess.py", line 537, in check_output

    raise CalledProcessError(retcode, cmd, output=output)

CalledProcessError: Command '['ssh', '-x', '10.0.0.249', 'ls -1 /data/infoscout_backup/v_infoscout_node0002']' returned non-zero exit status 255


Child processes terminated abnormally.

backup failed!

cleaning up...

1802: vbr client subproc on 10.0.0.105 terminates with returncode 2. Details in vbr_v_infoscout_node0001_client.log on that host. 

Error msg: cancelled by SIGINT


1805: vbr client subproc on 10.0.0.107 terminates with returncode 255. Details in vbr_v_infoscout_node0003_client.log on that host. 

Error msg: Killed by signal 2.


Retrying... #1

ERROR 4153:  Node: v_infoscout_node0003: Cannot grab lock to create snapshot 'full_cluster_backup'. It might be used by others

When communicating with vertica, the process failed with code 1

backup failed!

Retrying... #2

ERROR 4153:  Node: v_infoscout_node0003: Cannot grab lock to create snapshot 'full_cluster_backup'. It might be used by others

When communicating with vertica, the process failed with code 1

backup failed! 



Comments

  • Hello Vertica. Can someone help with this issue?

    Thank you!
  • Hello Vertica. Can someone help with this issue?

    Thank you!
  • Hello Atomix,

    The "returned non-zero exit status 255" error message typically indicates that although passwordless ssh is configured, all nodes do not have all the other nodes in their known hosts file. So you may have to go to each host in the cluster and do "ssh hostname" to all hosts including the one you are on and answer "yes" to the prompt to add to known hosts file if given.

    Please try this and let us know the results at your earliest convenience.

    Thanks,
    Rory
  • Prasanta_PalPrasanta_Pal - Select Field - Employee
    Apart from what Rory's recommendation, you may do the following

    What is the node with IP 10.0.0.249?

    run the below command from there?
    ssh -x 10.0.0.249

    i.e run ssh command from the same node to the same node and accept 'yes'

  • The password less ssh works properly, but still have the error. 
    Any help? 
  • Hello Susan,

    Please confirm that the known_hosts file on each node includes each node in the cluster including the node you're on. To do so:

    cat ~/.ssh/known_hosts

    So, in a 3 node cluster, the known_hosts file on 10.5.5.10 should include:

    10.5.5.10
    10.5.5.11
    10.5.5.12

    Also, please send in the latest error message you're getting after ensuring that SSH has been enabled.

    Thanks,
    Rory
  • Hi Rory,
       
    I am using back server also my primary vertica server .

    snapshotName = testschemabkupverticaConfig = True
    restorePointLimit = 2
    objects = xxxxx

    [Database]
    dbName = 
    dbUser = 
    dbPassword = 

    [Transmission]

    [Mapping]
    v_xxxxx_node0001 =d56uz:/users/home/dbadmin/
    v_xxxxx_node0002 = d56uz:/users/home/dbadmin/
    v_xxxxx_node0003 = d56uz:/users/home/dbadmin/
    v_vx_node0004 = d56uz:/users/home/dbadmin/
      

     me too got the same error. yes i can able to see my server ip addresses cat ~/.ssh/known_hosts . 

    could you please help in this regard
  • Prasanta_PalPrasanta_Pal - Select Field - Employee
    select * from database_snapshots;

    if there are unwanted snapshots, you may remove them by the below command and then re-try.

    select remove_database_snapshot('full_cluster_backup')



Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file