Unable to start db
Hi,
I just setup three node test cluster (Vertica 10 Community Edition) on Linux vm. I used CentOS images managed by Oracle VirtualBox. Machines communicate each other and apparently everything is ok at OS level.
Vertica installation went fine, but finally I can not start freshly created database.
Using adminTools I have:
*** Starting database: kaka ***
Starting nodes:
v_kaka_node0001 (192.168.1.201)
v_kaka_node0002 (192.168.1.202)
v_kaka_node0003 (192.168.1.203)
Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
Node Status: v_kaka_node0001: (DOWN) v_kaka_node0002: (DOWN) v_kaka_node0003: (DOWN)
...then I wait few minutes and get:
Nodes in TRANSITIONAL state: 192.168.1.201, 192.168.1.203, 192.168.1.202
Nodes DOWN: v_kaka_node0001, v_kaka_node0002, v_kaka_node0003 (may be still initializing).
Server startup was successful on some nodes, but not complete
In adminTools.log I have suspicious message:
2020-07-15 14:05:16.465 at_exec/6637:0x7ffb6e408740 [ATRunner.exec_module] running: module=vertica.engine.api.db_client.module version=1.0 args={"description": "ge
t cluster status", "cluster_status": true, "database": "kaka", "port": 5433}
2020-07-15 14:05:16.487 at_exec/6637:0x7ffb6e408740 [ATRunner.exec_module] result: status=Failure host=None content={"description": "get cluster status", "failure_r
eason": "ConnectionError: Failed to establish a connection to the primary server or any backup address.", "failure_operation": "connect-secure", "failure_details": {"stack": "Traceba
ck (most recent call last):\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/executor.py\", line 133, in _run_cluster_status\n conn = co
nn_helper.make_connection()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/connection_helper.py\", line 149, in make_connection\n conn
= self.try_secure()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/connection_helper.py\", line 180, in try_secure\n return self._fin
alize_conn(self.conn_method(**args))\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/cluster_status.py\", line 66, in cluster_status_conne
ct\n return ClusterStatusConnection(kwargs) # type: ignore\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 280, in __in
it__\n self.startup_connection()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 609, in startup_connection\n self.w
rite(messages.Startup(user, database, session_label, os_user_name))\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/engine/api/db_client/cluster_status.py\", line 72, in write\n return super().write(message)\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 499, in write\n sock = self._socket()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 361, in _socket\n raw_socket = self.establish_connection()\n File \"/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica_python/vertica/connection.py\", line 480, in establish_connection\n raise errors.ConnectionError(err_msg)\nvertica_python.errors.ConnectionError: Failed to establish a connection to the primary server or any backup address.\n", "type": "", "message": "Failed to establish a connection to the primary server or any backup address."}, "runner_ack": true} error_message=None
dbLog looks ok:
Conf_load_conf_file: using file: /home/dbadmin/kaka/v_kaka_node0001_catalog/spread.conf
Conf_load_conf_file: vertica version is 7
Setting active IP version to 2
Configured daemon 'N192168001201' with IP '192.168.1.201'
Auto-generated virtual ID = '3372329152' for daemon 'N192168001201'
Daemon 'N192168001201' will have virtual ID = '3372329152'
Configured daemon 'N192168001202' with IP '192.168.1.202'
Auto-generated virtual ID = '3389106368' for daemon 'N192168001202'
Daemon 'N192168001202' will have virtual ID = '3389106368'
Configured daemon 'N192168001203' with IP '192.168.1.203'
Auto-generated virtual ID = '3405883584' for daemon 'N192168001203'
Daemon 'N192168001203' will have virtual ID = '3405883584'
Successfully configured Segment 0 [192.168.1.255]:4803 with 3 procs:
N192168001201: 192.168.1.201
N192168001202: 192.168.1.202
N192168001203: 192.168.1.203
Connected to spread on local domain socket /opt/vertica/spread/tmp/4803
auto restart closing socket
startup.log ends with:
{
"node" : "v_kaka_node0001",
"stage" : "Waiting for Cluster Invite",
"text" : "Prepare to be invited",
"timestamp" : "2020-07-15 13:54:54.001"
}
{
"node" : "v_kaka_node0001",
"stage" : "Waiting for Cluster Invite",
"text" : "Ready to be invited",
"timestamp" : "2020-07-15 13:54:54.002"
}
vertica.log ends with normal messages:
2020-07-15 14:25:48.000 Timer Service:0x7fb2f27fc700 [Util] Task 'FeatureUseLogger' enabled
2020-07-15 14:25:48.000 Timer Service:0x7fb2f27fc700 [Util] Task 'LicenseSizeAuditor' enabled
2020-07-15 14:25:48.000 Cluster Inviter:0x7fb2d9ffb700 [Comms] My global sequence value is 136008
I checked netstat and it looks fine:
[dbadmin@cent1 v_kaka_node0001_catalog]$ netstat -utln|grep 5433
udp 0 0 192.168.1.201:5433 0.0.0.0:*
Where should I check for the problem?
Best Answer
-
Jim_Knicely - Select Field - Administrator
Do have a firewall running on each node? Check out the "Firewall Considerations" doc page:
5
Answers
Other than ports and firewall, since this a virtualized environment, check if you had enabled point-to-point communication when you had created your cluster.
Hi,
I installed it using "install_vertica --hosts --rpm " only.
Since all hosts are in this same subnet, I ignored this option.
Should I reinstall with '-T' option or I can reconfigure it?
I have no firewalls. Will install nmap to test udp traffic.
Regards
For
--point-to-point
, the docs say:You could try re-configuring using
update_vertica
and setting parameters-s
-r
-T
and-S
. But I would just simply reinstall.Hi,
I did it
Indeed it was necessary to switch off default firewalld
systemctl mask firewalld
systemctl disable firewalld
systemctl stop firewalld
Before:
[root@cent2 centos]# nmap -sU -p 5433 192.168.1.202
Starting Nmap 7.70 ( https://nmap.org ) at 2020-07-17 11:34 CEST
Nmap scan report for cent2 (192.168.1.202)
Host is up (0.000027s latency).
PORT STATE SERVICE
5433/udp closed pyrrho
After:
[root@cent2 centos]# nmap -sU -p 5433 192.168.1.202
Starting Nmap 7.70 ( https://nmap.org ) at 2020-07-17 11:39 CEST
Nmap scan report for cent3 (192.168.1.203)
Host is up (0.00032s latency).
PORT STATE SERVICE
5433/udp open|filtered pyrrho
It was not necessary to reinstall with "point-to-point" option.
Thank you for your help.
Regards