autostart database
Hello, I have an installed vertica cluster with 3 hosts. I want to do autostart database when starting hosts (when all hosts were shutdown and now I turn on them). At every host in /etc/init.d/ I added script myscript.sh:sudo -u myuser /opt/vertica/bin/admintools -t start_db -d test
When I run this script manually, it works and database is starting. But when the script is launched at OS startup, the database does not start. In the adminTools.log log I don't see startup errors, I see only pexpecting vsql command... and All nodes in db test are in state DOWN
2020-06-08 16:42:08.526 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n -p 5433 -U myuser -h 192.168.0.5 test -P pager -A 2020-06-08 16:42:08.984 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n -p 5433 -U myuser -h 192.168.0.6 test -P pager -A 2020-06-08 16:42:09.459 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n -p 5433 -U myuser -h 192.168.0.7 test -P pager -A 2020-06-08 16:43:05.639 admintools/3701:0x7f456298c740 [adminExec.getCollapsedClusterState] <INFO> All nodes in db test are in state DOWN
Why is that?
Tagged:
0
Answers
It's recommended to start from only one node rather than adding to each node. Also, it's probably best to start using systemd rather than admintools directly - there should be a script /etc/init.d/verticad installed for this, if not then check in /opt/vertica/sbin for init.d and systemd scripts.
I left the script in /etc/init.d/ only at one host but the problem persisted
If you run
/etc/init.d/verticad start, does it work?If so, then use the command in that file, which is:
su -l "$user" -c "${vertica_prefix}/bin/adminTools -t command_host -c $1"
No, service verticad doesn't work
I run
systemctl start verticadThen I run:
systemctl status verticadverticad.service - Vertica server restart oneshot
Loaded: loaded (/etc/systemd/system/verticad.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2020-06-11 07:19:43 MSK; 38min ago
Process: 650 ExecStart=/opt/vertica/sbin/verticad start (code=exited, status=0/SUCCESS)
Main PID: 650 (code=exited, status=0/SUCCESS)
Jun 11 07:19:09 verticaserv1 systemd[1]: Starting Vertica server restart oneshot... Jun 11 07:19:13 verticaserv1 su[706]: (to mydba) root on none Jun 11 07:19:43 verticaserv1 verticad[650]: Vertica: start OK for users: mydba Jun 11 07:19:43 verticaserv1 verticad[650]: [ OK ] Jun 11 07:19:43 verticaserv1 systemd[1]: Started Vertica server restart oneshot.In logs (/opt/vertica/log/verticad.log and /var/log/messages) I see only:
vertica process is not running
Vertica: start not OK
Two other places to check: /opt/vertica/log/admintools.log, since all startups wrap admintools, and if nothing there, or to verify what admintools logs show, check the catalog directory for "startup.log" which will show more details.
Usually when I run into this, it means one node didn't shut down cleanly, and it is necessary to roll back failed transactions with admintools -t restart_db with additional recovery options.
Always be sure to shut down a EE mode (local data and catalog) database before shutting down the hosts to avoid issues where Vertica has not finished commit to disk.
Why is this service oneshot? I want to restart the down node automatically. And I couldn't find any reason why verticad can't do it periodically on each node: adminTools implements the (re)start policy properly with "command_host".
The reason for the problem above (non-working verticad): "oneshot" service stops after exec "/opt/vertica/sbin/verticad start". But KillMode defaults to 'control-group'. That means every process of this service is killed with SIGTERM.
Thus I found in vertica.log:
Spread Mailbox Dequeue:0x7f9b84ff9700 [Comms] <WARNING> error SP_receive: Connection closed by spread Spread Mailbox Dequeue:0x7f9b84ff9700 [Comms] <WARNING> error SP_receive: The network socket experienced an error. This Spread mailbox will no longer work until the connection is disconnected and then reconnectedand in spread.log:
*** spread pid=*** received termination signal 15One possible solution is KillMode=process (or none) in [Service] section of verticad.service
I apologise for stealing others topic
, but I am struggling with the same issue on CentOS 7. This verticad systemd service does not work.
Using vertica 10.0.0 rpm.
systemctl start verticadverticad log file has following then:
While checking admintools.log file, there is this:
And when I execute this same statement from dbadmin user:
[dbadmin@db-simpl ~]$ '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/tools/ATMain.py' -t command_host -cstartBut database is not started.
Raul
I apologise for stealing others topic
, but I am having the exact same issue on CentOS 7, this verticad systemd service just does not work.
Using vertica 10.0.0 rpm
When issuing:
systemctl start verticadThen in log file, there is following entries created (/opt/vertica/log/verticad.log)
If I issue this same statement for dbadmin user:
'/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/tools/ATMain.py' -t command_host -cstartI get same errors:
And database is not started up.
Raul
Raul, this feature does not automatically restart nodes if the entire database is DOWN:
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AdministratorsGuide/AdminTools/SettingTheRestartPolicy.htm
You should be able to start the Vertica nodes manually using /opt/vertica/bin/vertica.
Below is an example, and I also attached an old script that will start all nodes (and thus the entire DB). I gave it to a client in the past to restart their cluster.
[dbadmin@SE-Sandbox-26-node2 ~]$ vsql -c "SELECT node_name, node_address, node_state FROM nodes;" node_name | node_address | node_state --------------------+---------------+------------ v_test_db_node0001 | 172.16.61.176 | UP v_test_db_node0002 | 172.16.61.177 | UP v_test_db_node0003 | 172.16.61.178 | UP (3 rows) [dbadmin@SE-Sandbox-26-node2 ~]$ ssh 172.16.61.178 "ps -ef | grep \"[v]ertica -D\"" dbadmin 410150 1 1 Aug19 ? 01:10:58 /opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0003_catalog -C test_db -n v_test_db_node0003 -h 172.16.61.178 -p 5433 -P 4803 -Y ipv4 -e -S 271 [dbadmin@SE-Sandbox-26-node2 ~]$ ssh 172.16.61.178 "kill -9 410150" [dbadmin@SE-Sandbox-26-node2 ~]$ vsql -c "SELECT node_name, node_address, node_state FROM nodes;" node_name | node_address | node_state --------------------+---------------+------------ v_test_db_node0001 | 172.16.61.176 | UP v_test_db_node0002 | 172.16.61.177 | UP v_test_db_node0003 | 172.16.61.178 | DOWN (3 rows) [dbadmin@SE-Sandbox-26-node2 ~]$ ssh 172.16.61.178 "/opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0003_catalog -C test_db -n v_test_db_node0003 -h 172.16.61.178 -p 5433 -P 4803 -Y ipv4 -e -S 271" Connecting to spread at 4803 /===========================================================================\ | The Spread Toolkit | | Copyright (c) 1993-2016 Spread Concepts LLC | | All rights reserved. | | | | The Spread toolkit is licensed under the Spread Open-Source License. | | You may only use this software in compliance with the License. | | A copy of the license can be found at http://www.spread.org/license | | | | This product uses software developed by Spread Concepts LLC for use | | in the Spread toolkit. For more information about Spread, | | see http://www.spread.org | | | | This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF | | ANY KIND, either express or implied. | | | | Creators: | | Yair Amir yairamir@cs.jhu.edu | | Michal Miskin-Amir michal@spreadconcepts.com | | Jonathan Stanton jonathan@spreadconcepts.com | | John Schultz jschultz@spreadconcepts.com | | | | Major Contributors: | | Amy Babay babay@cs.jhu.edu - accelerated ring protocol. | | Ryan Caudy rcaudy@gmail.com - contribution to process groups.| | Claudiu Danilov claudiu@acm.org - scalable, wide-area support. | | Cristina Nita-Rotaru crisn@cs.purdue.edu - GC security. | | Theo Schlossnagle jesus@omniti.com - Perl, autoconf, old skiplist. | | Dan Schoenblum dansch@cnds.jhu.edu - Java interface. | | | | Special thanks to the following for discussions and ideas: | | Ken Birman, Danny Dolev, Jacob Green, Mike Goodrich, Ben Laurie, | | David Shaw, Gene Tsudik, Robbert VanRenesse. | | | | Partial funding provided by the Defense Advanced Research Project Agency | | (DARPA) and the National Security Agency (NSA) 2000-2004. The Spread | | toolkit is not necessarily endorsed by DARPA or the NSA. | | | | For a full list of contributors, see Readme.txt in the distribution. | | | | WWW: www.spread.org www.spreadconcepts.com | | Contact: info@spreadconcepts.com | | | | Version 5.00.03 Built 22/Feb/2020 | \===========================================================================/ Conf_load_conf_file: using file: /home/dbadmin/test_db/v_test_db_node0003_catalog/spread.conf Conf_load_conf_file: vertica version is 1845 Setting active IP version to 2 Configured daemon 'N172016061176' with IP '172.16.61.176' Auto-generated virtual ID = '2956791980' for daemon 'N172016061176' Daemon 'N172016061176' will have virtual ID = '2956791980' Successfully configured Segment 0 [172.16.61.176]:4803 with 1 procs: N172016061176: 172.16.61.176 Configured daemon 'N172016061177' with IP '172.16.61.177' Auto-generated virtual ID = '2973569196' for daemon 'N172016061177' Daemon 'N172016061177' will have virtual ID = '2973569196' Successfully configured Segment 1 [172.16.61.177]:4803 with 1 procs: N172016061177: 172.16.61.177 Configured daemon 'N172016061178' with IP '172.16.61.178' Auto-generated virtual ID = '2990346412' for daemon 'N172016061178' Daemon 'N172016061178' will have virtual ID = '2990346412' Successfully configured Segment 2 [172.16.61.178]:4803 with 1 procs: N172016061178: 172.16.61.178 Starting UDxSideProcess for language C++ with command line: /opt/vertica/bin/vertica-udx-C++ 3 v_test_db_node0003-523725:0x2 debug-log-off /home/dbadmin/test_db/v_test_db_node0003_catalog/UDxLogs 4 Starting UDxSideProcess for language C++ with command line: /opt/vertica/bin/vertica-udx-C++ 3 v_test_db_node0003-523725:0xd debug-log-off /home/dbadmin/test_db/v_test_db_node0003_catalog/UDxLogs 4 ^CKilled by signal 2. [dbadmin@SE-Sandbox-26-node2 ~]$ vsql -c "SELECT node_name, node_address, node_state FROM nodes;" node_name | node_address | node_state --------------------+---------------+------------ v_test_db_node0001 | 172.16.61.176 | UP v_test_db_node0002 | 172.16.61.177 | UP v_test_db_node0003 | 172.16.61.178 | UP (3 rows)Ok, thanks I changed start policy to
Always — Node on a single node database is restarted automatically.
Now, the systemd start command works.
systemctl start verticadBut the service stays in inactive state, even if the database is actually up and running:
Raul
Service type is oneshot. So exec-command works only once when the service is starting.
Any idea how to change this so that it is not oneshot service..?
Raul
Ah, never mind.
vi /etc/systemd/system/verticad.serviceI added:
And then:
systemctl daemon-reloadNow I can stop/start with systemctl
Raul
periodical retries
`# /etc/systemd/system/verticad.service
[Unit]
Description=Vertica server restart
[Service]
Type=oneshot
ExecStart=/opt/vertica/sbin/verticad start
SuccessExitStatus=1
KillMode=none`
`# /etc/systemd/system/verticad.timer
[Unit]
Description=Vertica server restart timer
[Timer]
OnUnitActiveSec=10min
OnBootSec=1min
[Install]
WantedBy=multi-user.target`