Unable to start cluster

Hello,

 

We have a huge problem in our database which we are using in production environment. We wanted to add new nodes to our 3-nodes cluster but that did not work and now we have our cluster in 'down' state. We tried to remove new nodes and restart cluster but without any luck. All that we have now is all nodes in INITIALIZING state and we have been waiting already for more than one hour for them to start. So it is clear that we have some problems.

 

That is what we see in vertica.log.

2015-06-16 02:24:26.994 Init Session:0x7f7eec012660 <FATAL> @v_stats3_node0001: {SessionRun} 57V03/5785: Cluster Status Request by XX.XXX.XXX.XX:41890
HINT: Cluster State: stats3
INITIALIZING: 1 of 6 (v_stats3_node0001)
----
LOCATION: initSession, /scratch_a/release/30493/vbuild/vertica/Session/ClientSession.cpp:436
2015-06-16 02:24:29.452 Init Session:0x7f7eec011280 <LOG> @v_stats3_node0001: 00000/2705: Connection received: host=XX.XXX.XXX.XX port=40515 (connCnt 1)
2015-06-16 02:24:29.452 Init Session:0x7f7eec011280 <LOG> @v_stats3_node0001: 00000/4540: Received SSL negotiation startup packet
2015-06-16 02:24:29.452 Init Session:0x7f7eec011280 <LOG> @v_stats3_node0001: 00000/4691: Sending SSL negotiation response 'N'

 

or this

 

2015-06-16 03:01:01.166 Init Session:0x7ff004012d10 <LOG> @v_stats3_node0001: 00000/2705: Connection received: host=127.0.0.1 port=55593 (connCnt 1)
2015-06-16 03:01:01.166 Init Session:0x7ff004012d10 <LOG> @v_stats3_node0001: 00000/4540: Received SSL negotiation startup packet
2015-06-16 03:01:01.166 Init Session:0x7ff004012d10 <LOG> @v_stats3_node0001: 00000/4691: Sending SSL negotiation response 'N'
2015-06-16 03:01:01.166 Init Session:0x7ff004012d10 <FATAL> @v_stats3_node0001: {SessionRun} 57V03/4149: Node startup/recovery in progress. Not yet ready to accept connections
LOCATION: initSession, /scratch_a/release/30493/vbuild/vertica/Session/ClientSession.cpp:459

Comments

  • Hi

     

    From the output of vertica.log file it looks like entry of the new three nodes that failed has been added in the admintools.

     

    HINT: Cluster State: stats3
    INITIALIZING: 1 of 6 (v_stats3_node0001)

     

    If you are customer & have support license will suggest you to raise a support case with HP vertica in order to help you resolve this issue.

     

     

    Regards

    Rahul Choudhary

  • Yes, we are customer but unfortunately our support has expired already and it is clear that we can not renew it quickly. Is there anything that we can do now to restore our cluster?

     

    It is kind of weird that in logs we see that cluster tries to initialize 6 nodes when there should be only 3 of them.

  • And more weird things.

     

    In /Catalog/config.cat we can see that there are 6 nodes in cluster however admintools only says about 3. So it looks like something has happened when we were in process of addition of new nodes. And based on vertica.log node0001 stuck in INITIALIZIING mode as it tries to load 6 nodes.

     

    Can we try to edit this config manually and remove rows about wrong nodes or it can leave us in even worse situation?

  • Hi

     

    I am afraid it the entried got added to the catalog then in that case only option left is to perform catalog hacking which can't be done without the vertica technical support supervision.

     

    So u might try changing certain config files but that would be on your own discretion.

     

    Also I would suggest you to give it a try & try to start the nodes individually on the existing three nodes using below command as an example:

     

    For e,g:

     

    /opt/vertica/bin/vertica -D /home/dbadmin/test_crane/v_test_crane_node0001_catalog -C test_crane -n v_test_crane_node0001 -h 10.50.52.41 -p 5433 -P 4803 -Y ipv4

     

     

    You can find similiar command in your vertica.log files at the beginning & can run that individually changing the corresponding ip & node name.

     

     

    Regards

    Rahul Choudhary

  • Thank you, Rahul!

     

    We were able to run vertica process manually on all nodes including new ones. After that we just fixed manually admintools config  to add missed nodes so it started to work without crashes. Then we stopped database from admintools and started it correctly. Now all nodes are up and running and we were able to start cluster rebalancing.

  • Hi Sergey

     

    I am glad if in any way I have proved of any help to bring your cluster back up.Also just for a thought please get your support contract reniewed in order to avoid such messy situation in future & get expert technical help from Vertica support.

     

    Have a great day ahead :-)

     

    Cheers

    Rahul Choudhary

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file