Migrating single node ( local host bind) IP address node to 3 nodes Vertica cluster

msanjibmsanjib Vertica Customer

Hi Team,
I have been using a single node (127.0.0.1) vertica cluster. Currently , I am planning to move the DB from this node to a new 3 node cluster. In fact, I already changed the IP to a new non-local IP using "admintools -t re_ip -f mapfile.txt". I changes the IP just to avoid any type of backup/restore issue , if any.

AFAIK, we can't restore the full back from the single node cluster to 3 node cluster as per the design. So, I decided to move to a brand new single node cluster ( bind to x.x.x.x having same DB name ). I restored the full backup from the source one. It worked fine.

Now I wanted to add 2 new nodes to the above single node cluster ( IP = x.x.x.x). I used below command to add 2 hosts to the cluster ( x.x.x.x node1)
/opt/vertica/sbin/update_vertica --add-host y.y.y.y, z.z.z.z --dba-user verticauser --license ~/123.dat --failure-threshold FAIL --rpm ~/vertica-9.2.1-0.x86_64.RHEL6.rpm
it works fine. I a not sure, what is the best way to verify that the host additions to cluster worked fine. The result did't through any error.

Then, I tried to add the database to the new hosts.
admintools -t db_add_node -d testdb -p novell -s y.y.y.y , z.z.z.z
Looks like it works fine as well. I believe this step must create the database directory in same location. I am not able to see all the files under catalog (e.g. /v_testdb_node0002_catalog/ ). Only spread.conf and vertica.conf files exist.

Issue:
Database starts on both the nodes throws error.

*** Starting database: testdb ***
Starting nodes:
v_testdb_node0001 (x.x.x.x)
v_testdb_node0002 (y.y.y.y)
Error: the vertica process for the database is running on the following hosts:
y.y.y.y
This may be because the process has not completed previous shutdown activities. Please wait and retry again.
Database start up failed. Processes still running.
Found these errors in startup.logs on hosts:
.../v_testdb_node0002_catalog/startup.log: exception Can't open .../v_testdb_node0002_catalog/startup.logfor reading: [Errno 2] No such file or directory:

I may be doing something completely wrong. The passwordless ssh was also set correctly. This looks like the required files for database (catalog) are not getting replicated to other 2 nodes.
I will appreciate for any help to resolve this issue.

Regards,
SM

Best Answers

  • msanjibmsanjib Vertica Customer
    Answer ✓

    startup.log

    {
    "goal" : 193483140,
    "node" : "v_appdefenderdb_node0001",
    "progress" : 171015310,
    "stage" : "Read DataCollector",
    "text" : "Inventory files (bytes)",
    "timestamp" : "2020-01-22 10:59:34.843"
    }
    {
    "goal" : 193483140,
    "node" : "v_appdefenderdb_node0001",
    "progress" : 191441417,
    "stage" : "Read DataCollector",
    "text" : "Inventory files (bytes)",
    "timestamp" : "2020-01-22 10:59:34.912"
    }
    {
    "node" : "v_appdefenderdb_node0001",
    "stage" : "Check Storage",
    "text" : "Removing unnecessary storage files",
    "timestamp" : "2020-01-22 10:59:35.006"
    }
    {
    "goal" : 24,
    "node" : "v_appdefenderdb_node0001",
    "progress" : 0,
    "stage" : "Check Storage",
    "text" : "Confirming storage matches catalog (files)",
    "timestamp" : "2020-01-22 10:59:36.249"
    }
    {
    "goal" : 24,
    "node" : "v_appdefenderdb_node0001",
    "progress" : 11,
    "stage" : "Check Storage",
    :

Answers

  • Bryan_HBryan_H Vertica Employee Administrator

    Is the single node listed as localhost or 127.0.0.1 in /opt/vertica/config/admintools.conf? If so, you will not be able to add nodes to the cluster. Please review the "Important" comment at https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/ManageNodes/AddingNodes.htm?zoom_highlight="single node"
    To create a three-node cluster, I recommend the following procedure:

    • back up the existing single node
    • create a new node with the valid IP and/or hostname
    • restore the database on the new single node instance
    • then add the two nodes to this new instance
  • msanjibmsanjib Vertica Customer

    Thanks.
    In fact, I followed the exact same steps.
    my cluster is not starting. Is there any way to recover from this step?

    Regards,
    SM

    admintools -t start_db -d appdefenderdb -p

    Starting nodes:
    v_appdefenderdb_node0001 (x.x.x.x)
    v_appdefenderdb_node0002 (y.y.y.y)
    v_appdefenderdb_node0003 (z.z.z.z)
    Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
    Checking database state
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    It is suggested that you continue waiting.
    Nodes in TRANSITIONAL state: ,,
    Nodes DOWN: v_appdefenderdb_node0001, v_appdefenderdb_node0002, v_appdefenderdb_node0003 (may be still initializing).
    It is suggested that you continue waiting.
    Do you want to continue waiting? (yes/no) [yes]
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
    Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)

  • SruthiASruthiA Administrator
    edited January 2020

    please check startup.log to find where it is failing.

  • msanjibmsanjib Vertica Customer

    Please find the logs from the vertica.log file. I don't find any clue. May be I have enable debug level flag to get the complete information. The below log just repeating. No luck, let me check all the network pre-requisite one more time.
    /appdefender/db_data/verticauser/appdefenderdb/v_appdefenderdb_node0001_catalog/vertica.log

    2020-01-22 10:47:55.002 MetadataPoolMonitor:0x7f734bfff700 @v_appdefenderdb_node0001: 00000/7794: Updated metadata pool: Memory(KB): 22435
    2020-01-22 10:47:57.010 Cluster Inviter:0x7f734bfff700 [Comms] My global sequence value is 131700
    2020-01-22 10:47:59.001 Cluster Inviter:0x7f734bfff700 [Comms] My global sequence value is 131700
    2020-01-22 10:48:01.001 Cluster Inviter:0x7f7359b0b700 [Comms] My global sequence value is 131700
    2020-01-22 10:48:02.000 Timer Service:0x7f735b30e700 [Util] Task 'FeatureUseLogger' enabled
    2020-01-22 10:48:02.000 Timer Service:0x7f735b30e700 [Util] Task 'LicenseSizeAuditor' enabled
    Thanks
    SM

  • msanjibmsanjib Vertica Customer

    Finally, it worked.
    I believe the the cluster use default UDP broadcast traffic for cluster communication. Either I should have opted for " -T --point-to-point" while adding the hosts while calling install_vertica or update_vertica script.
    This way I will enforce the cluster node(spread) to communicate using TCP instead of UDP. I believe this is the best option as it will not flood the network traffic.

    I just disable to firewall for testing purpose and it worked.
    Thanks for helping me troubleshooting.

  • SruthiASruthiA Administrator

    Good to know it worked. Please review the below link and make sure everything is set as per the recommendation to avoid performance hit and other issues.

    https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/InstallationGuide/BeforeYouInstall/OsConfigTaskOverview.htm?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file