Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

cannot replace a permanently down node. update_vertica fails with host is unreachable

skeswaniskeswani Employee
edited October 2019 in General Discussion

update_vertica/install_vertica --remove-host/--add-host fails because host is unreachable. I cannot replace a permanently down node.

Yikes! a node went down permanently
And I want to replace that node with a new node (which has a different IP).
When this happens, I first tried to add a node to the cluster.
I run update_vertica --add-host new_host It fails saying a old_node is down
However, if I try
update_vertica --remove-host old_host It fails saying a old_node is part of the test2 database.

I am stuck, i cannot replace the node !
can someone help me

Answers

  • skeswaniskeswani Employee
    edited October 2019

    You want to replace a old/dead node with a new node (which has a different IP).
    First of all, do NOT re-balance, that is not required here and is the wrong solution to the problem.

    Here is a step by step example on how to go about replacing a permanently down node

    consider a cluster where one node is down and you want to replace this node 10.11.12.24 (dead/old node) with 10.11.12.30 (new node) that you just setup.

    take a new node with new IP 10.11.12.30 and set it up as a single node cluster

    [[email protected] ~]$ sudo /opt/vertica/sbin/install_vertica -s 10.11.12.30 --clean <== PROVIDE SAME ARGS FROM CLUSTER *** (on node 10.11.12.10 do grep install_opts /opt/vertica/config/admintools.conf)***
    Vertica Analytic Database 9.2.1-1 Installation Tool
    ...
    Installation complete.

    Make sure passwordless ssh is setup correctly between this new node and all nodes of the existing cluster for user dbadmin

    Edit the admintools.conf on all nodes of the existing cluster to make a reference to the new node

    node (10.11.12.24) is dead and gone.

    dbadmin=> select node_address, node_state from nodes;
    node_address | node_state
    --------------+------------
    10.11.12.10 | UP
    10.11.12.20 | UP
    10.11.12.24 | DOWN
    (3 rows)

    modify the admintools.conf file to add the new node as shown below
    original

    [[email protected] ~]$ grep -A 1 "[Cluster]" /opt/vertica/config/admintools.conf
    [Cluster]
    hosts = 10.11.12.10,10.11.12.20,10.11.12.24

    new = now this file has the extra node IP address added and a reference to the new node you have setup (10.11.12.30)

    [[email protected] ~]$ grep -A 1 "[Cluster]" /opt/vertica/config/admintools.conf
    [Cluster]
    hosts = 10.11.12.10,10.11.12.20,10.11.12.24,10.11.12.30 <== THIS LINE IS APPENDED TO ADD HOST 10.11.12.30
    [[email protected] ~]$ grep -A 4 "[Nodes]" /opt/vertica/config/admintools.conf
    [Nodes]
    v_test2_node0001 = 10.11.12.10,/vertica/data,/vertica/data
    v_test2_node0002 = 10.11.12.20,/vertica/data,/vertica/data
    v_test2_node0003 = 10.11.12.24,/vertica/data,/vertica/data
    v_test2_node0004 = 10.11.12.30,/vertica/data,/vertica/data <== THIS LINE IS ADDED, note its say node0004

    distribute this newly modified admintools.conf file to all nodes

    [[email protected] ~]$ admintools -t distribute_config_files
    Initiating admintools.conf distribution...
    Could not send admintools.conf to all nodes in cluster.
    Hint: Is passwordless ssh configured correctly?
    Error message:
    Could not copy file to host 10.11.12.24 <=== THIS IS EXPECTED TO FAIL, IGNORE IT

    check to make sure admintools.conf was distributed correctly. Notice the new node here

    [[email protected] ~]$ for node in 10.11.12.10 10.11.12.20 10.11.12.30; do ssh $node md5sum /opt/vertica/config/admintools.conf ; done
    0b2973050e63e121744fc89004d1b3ab /opt/vertica/config/admintools.conf
    0b2973050e63e121744fc89004d1b3ab /opt/vertica/config/admintools.conf
    0b2973050e63e121744fc89004d1b3ab /opt/vertica/config/admintools.conf

    force a recovery and a node replacement

    [[email protected] ~]$ admintools -t db_replace_node -o 10.11.12.24 -n 10.11.12.30 -d test2
    Replicating configuration to all nodes
    Starting database on replacment host
    Restarting host [10.11.12.30] with catalog [v_test2_node0003_catalog]
    Issuing multi-node restart
    Starting nodes:
    v_test2_node0003 (10.11.12.30)
    Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
    Node Status: v_test2_node0001: (UP) v_test2_node0003: (DOWN)
    Node Status: v_test2_node0001: (UP) v_test2_node0003: (RECOVERING)
    Node Status: v_test2_node0001: (UP) v_test2_node0003: (UP)
    Checking database state
    Node Status: v_test2_node0001: (UP) v_test2_node0002: (UP) v_test2_node0003: (UP)
    Deleting catalog and data directories
    Error(s) detected while deleting catalog and data directories: Host: 10.11.12.24 Reported error in removal <== EXPECTED, IGNORE ( The database directories will need to be removed manually from 10.11.12.24)

    voila !

    dbadmin=> select node_address, node_state from nodes;
    node_address | node_state
    --------------+------------
    10.11.12.10 | UP
    10.11.12.20 | UP
    10.11.12.30 | UP
    (3 rows)

    Finally Clean up the admintools conf

    [[email protected] ~]$ grep -A 1 "[Cluster]" /opt/vertica/config/admintools.conf
    [Cluster]
    hosts = 10.11.12.10,10.11.12.20,,10.11.12.30 <== Removed dead node 10.11.12.24
    [[email protected] ~]$ grep -A 4 "[Nodes]" /opt/vertica/config/admintools.conf
    [Nodes]
    v_test2_node0001 = 10.11.12.10,/vertica/data,/vertica/data
    v_test2_node0002 = 10.11.12.20,/vertica/data,/vertica/data
    v_test2_node0004 = 10.11.12.30,/vertica/data,/vertica/data <== REMOVE THIS LINE IS, you had added before, its redundant now
    v_test2_node0003 = 10.11.12.30,/vertica/data,/vertica/data

    Distribute the admintools conf

    [[email protected] ~]$ admintools -t distribute_config_files
    Initiating admintools.conf distribution...
    Local admintools.conf sent to all nodes in the cluster.
    [[email protected] ~]$ for node in 10.11.12.10 10.11.12.20 10.11.12.30; do ssh $node md5sum /opt/vertica/config/admintools.conf ; done
    e2c3e1a650afe4958034374c096a5881 /opt/vertica/config/admintools.conf
    e2c3e1a650afe4958034374c096a5881 /opt/vertica/config/admintools.conf
    e2c3e1a650afe4958034374c096a5881 /opt/vertica/config/admintools.conf

  • RaviRavi Employee

    Thanks Sumeet this is great Solution. I request we should have an option in admintools to replace a un-reachable node to a new node.

  • chaimachaima Employee

    Thanks Sumeet for sharing!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.