Unable to start db

Hi,
I just setup three node test cluster (Vertica 10 Community Edition) on Linux vm. I used CentOS images managed by Oracle VirtualBox. Machines communicate each other and apparently everything is ok at OS level.
Vertica installation went fine, but finally I can not start freshly created database.

Using adminTools I have:
*** Starting database: kaka ***
Starting nodes:
v_kaka_node0001 (192.168.1.201)
v_kaka_node0002 (192.168.1.202)
v_kaka_node0003 (192.168.1.203)
Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
Node Status: v_kaka_node0001: (DOWN) v_kaka_node0002: (DOWN) v_kaka_node0003: (DOWN)
...then I wait few minutes and get:
Nodes in TRANSITIONAL state: 192.168.1.201, 192.168.1.203, 192.168.1.202
Nodes DOWN: v_kaka_node0001, v_kaka_node0002, v_kaka_node0003 (may be still initializing).
Server startup was successful on some nodes, but not complete

In adminTools.log I have suspicious message:
2020-07-15 14:05:16.465 at_exec/6637:0x7ffb6e408740 [ATRunner.exec_module] running: module=vertica.engine.api.db_client.module version=1.0 args={"description": "ge
t cluster status", "cluster_status": true, "database": "kaka", "port": 5433}
2020-07-15 14:05:16.487 at_exec/6637:0x7ffb6e408740 [ATRunner.exec_module] result: status=Failure host=None content={"description": "get cluster status", "failure_r
eason": "ConnectionError: Failed to establish a connection to the primary server or any backup address.", "failure_operation": "connect-secure", "failure_details": {"stack": "Traceba
ck (most recent call last):\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/executor.py\", line 133, in _run_cluster_status\n conn = co
nn_helper.make_connection()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/connection_helper.py\", line 149, in make_connection\n conn
= self.try_secure()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/connection_helper.py\", line 180, in try_secure\n return self._fin
alize_conn(self.conn_method(**args))\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/cluster_status.py\", line 66, in cluster_status_conne
ct\n return ClusterStatusConnection(kwargs) # type: ignore\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 280, in __in
it__\n self.startup_connection()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 609, in startup_connection\n self.w
rite(messages.Startup(user, database, session_label, os_user_name))\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/cluster_status.py\", line 72, in write\n return super().write(message)\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 499, in write\n sock = self._socket()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 361, in _socket\n raw_socket = self.establish_connection()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 480, in establish_connection\n raise errors.ConnectionError(err_msg)\nvertica_python.errors.ConnectionError: Failed to establish a connection to the primary server or any backup address.\n", "type": "", "message": "Failed to establish a connection to the primary server or any backup address."}, "runner_ack": true} error_message=None

dbLog looks ok:
Conf_load_conf_file: using file: /home/dbadmin/kaka/v_kaka_node0001_catalog/spread.conf
Conf_load_conf_file: vertica version is 7
Setting active IP version to 2
Configured daemon 'N192168001201' with IP '192.168.1.201'
Auto-generated virtual ID = '3372329152' for daemon 'N192168001201'
Daemon 'N192168001201' will have virtual ID = '3372329152'
Configured daemon 'N192168001202' with IP '192.168.1.202'
Auto-generated virtual ID = '3389106368' for daemon 'N192168001202'
Daemon 'N192168001202' will have virtual ID = '3389106368'
Configured daemon 'N192168001203' with IP '192.168.1.203'
Auto-generated virtual ID = '3405883584' for daemon 'N192168001203'
Daemon 'N192168001203' will have virtual ID = '3405883584'
Successfully configured Segment 0 [192.168.1.255]:4803 with 3 procs:
N192168001201: 192.168.1.201
N192168001202: 192.168.1.202
N192168001203: 192.168.1.203
Connected to spread on local domain socket /opt/vertica/spread/tmp/4803
auto restart closing socket

startup.log ends with:
{
"node" : "v_kaka_node0001",
"stage" : "Waiting for Cluster Invite",
"text" : "Prepare to be invited",
"timestamp" : "2020-07-15 13:54:54.001"
}
{
"node" : "v_kaka_node0001",
"stage" : "Waiting for Cluster Invite",
"text" : "Ready to be invited",
"timestamp" : "2020-07-15 13:54:54.002"
}

vertica.log ends with normal messages:
2020-07-15 14:25:48.000 Timer Service:0x7fb2f27fc700 [Util] Task 'FeatureUseLogger' enabled
2020-07-15 14:25:48.000 Timer Service:0x7fb2f27fc700 [Util] Task 'LicenseSizeAuditor' enabled
2020-07-15 14:25:48.000 Cluster Inviter:0x7fb2d9ffb700 [Comms] My global sequence value is 136008

I checked netstat and it looks fine:
[dbadmin@cent1 v_kaka_node0001_catalog]$ netstat -utln|grep 5433
udp 0 0 192.168.1.201:5433 0.0.0.0:*

Where should I check for the problem?

Best Answer

Answers

  • LenoyJLenoyJ - Select Field - Employee
    edited July 2020

    Other than ports and firewall, since this a virtualized environment, check if you had enabled point-to-point communication when you had created your cluster.

  • @LenoyJ said:
    Other than ports and firewall, since this a virtualized environment, check if you had enabled point-to-point communication when you had created your cluster.

    Hi,
    I installed it using "install_vertica --hosts --rpm " only.
    Since all hosts are in this same subnet, I ignored this option.
    Should I reinstall with '-T' option or I can reconfigure it?
    I have no firewalls. Will install nmap to test udp traffic.
    Regards

  • LenoyJLenoyJ - Select Field - Employee
    edited August 2020

    For --point-to-point, the docs say:

    Also use this option for all virtual environment installations, whether the virtual servers are on the same subnet or not.

    You could try re-configuring using update_vertica and setting parameters -s -r -T and -S. But I would just simply reinstall.

  • Hi,
    I did it :)
    Indeed it was necessary to switch off default firewalld
    systemctl mask firewalld
    systemctl disable firewalld
    systemctl stop firewalld

    Before:
    [root@cent2 centos]# nmap -sU -p 5433 192.168.1.202
    Starting Nmap 7.70 ( https://nmap.org ) at 2020-07-17 11:34 CEST
    Nmap scan report for cent2 (192.168.1.202)
    Host is up (0.000027s latency).

    PORT STATE SERVICE
    5433/udp closed pyrrho


    After:
    [root@cent2 centos]# nmap -sU -p 5433 192.168.1.202
    Starting Nmap 7.70 ( https://nmap.org ) at 2020-07-17 11:39 CEST
    Nmap scan report for cent3 (192.168.1.203)
    Host is up (0.00032s latency).

    PORT STATE SERVICE
    5433/udp open|filtered pyrrho

    It was not necessary to reinstall with "point-to-point" option.
    Thank you for your help.
    Regards

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file