autostart database
aleksss55
Community Edition User ✭
Hello, I have an installed vertica cluster with 3 hosts. I want to do autostart database when starting hosts (when all hosts were shutdown and now I turn on them). At every host in /etc/init.d/ I added script myscript.sh:sudo -u myuser /opt/vertica/bin/admintools -t start_db -d test
When I run this script manually, it works and database is starting. But when the script is launched at OS startup, the database does not start. In the adminTools.log log I don't see startup errors, I see only pexpecting vsql command... and All nodes in db test are in state DOWN
2020-06-08 16:42:08.526 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n -p 5433 -U myuser -h 192.168.0.5 test -P pager -A 2020-06-08 16:42:08.984 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n -p 5433 -U myuser -h 192.168.0.6 test -P pager -A 2020-06-08 16:42:09.459 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n -p 5433 -U myuser -h 192.168.0.7 test -P pager -A 2020-06-08 16:43:05.639 admintools/3701:0x7f456298c740 [adminExec.getCollapsedClusterState] <INFO> All nodes in db test are in state DOWN
Why is that?
Tagged:
0
Answers
It's recommended to start from only one node rather than adding to each node. Also, it's probably best to start using systemd rather than admintools directly - there should be a script /etc/init.d/verticad installed for this, if not then check in /opt/vertica/sbin for init.d and systemd scripts.
I left the script in /etc/init.d/ only at one host but the problem persisted
If you run
/etc/init.d/verticad start
, does it work?If so, then use the command in that file, which is:
su -l "$user" -c "${vertica_prefix}/bin/adminTools -t command_host -c $1"
No, service verticad doesn't work
I run
systemctl start verticad
Then I run:
systemctl status verticad
verticad.service - Vertica server restart oneshot
Loaded: loaded (/etc/systemd/system/verticad.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2020-06-11 07:19:43 MSK; 38min ago
Process: 650 ExecStart=/opt/vertica/sbin/verticad start (code=exited, status=0/SUCCESS)
Main PID: 650 (code=exited, status=0/SUCCESS)
In logs (/opt/vertica/log/verticad.log and /var/log/messages) I see only:
vertica process is not running
Vertica: start not OK
Two other places to check: /opt/vertica/log/admintools.log, since all startups wrap admintools, and if nothing there, or to verify what admintools logs show, check the catalog directory for "startup.log" which will show more details.
Usually when I run into this, it means one node didn't shut down cleanly, and it is necessary to roll back failed transactions with admintools -t restart_db with additional recovery options.
Always be sure to shut down a EE mode (local data and catalog) database before shutting down the hosts to avoid issues where Vertica has not finished commit to disk.
Why is this service oneshot? I want to restart the down node automatically. And I couldn't find any reason why verticad can't do it periodically on each node: adminTools implements the (re)start policy properly with "command_host".
The reason for the problem above (non-working verticad): "oneshot" service stops after exec "/opt/vertica/sbin/verticad start". But KillMode defaults to 'control-group'. That means every process of this service is killed with SIGTERM.
Thus I found in vertica.log:
Spread Mailbox Dequeue:0x7f9b84ff9700 [Comms] <WARNING> error SP_receive: Connection closed by spread Spread Mailbox Dequeue:0x7f9b84ff9700 [Comms] <WARNING> error SP_receive: The network socket experienced an error. This Spread mailbox will no longer work until the connection is disconnected and then reconnected
and in spread.log:
*** spread pid=*** received termination signal 15
One possible solution is KillMode=process (or none) in [Service] section of verticad.service
I apologise for stealing others topic , but I am struggling with the same issue on CentOS 7. This verticad systemd service does not work.
Using vertica 10.0.0 rpm.
systemctl start verticad
verticad log file has following then:
While checking admintools.log file, there is this:
And when I execute this same statement from dbadmin user:
[dbadmin@db-simpl ~]$ '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/tools/ATMain.py' -t command_host -cstart
But database is not started.
Raul
I apologise for stealing others topic , but I am having the exact same issue on CentOS 7, this verticad systemd service just does not work.
Using vertica 10.0.0 rpm
When issuing:
systemctl start verticad
Then in log file, there is following entries created (/opt/vertica/log/verticad.log)
If I issue this same statement for dbadmin user:
'/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/tools/ATMain.py' -t command_host -cstart
I get same errors:
And database is not started up.
Raul
Raul, this feature does not automatically restart nodes if the entire database is DOWN:
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AdministratorsGuide/AdminTools/SettingTheRestartPolicy.htm
You should be able to start the Vertica nodes manually using /opt/vertica/bin/vertica.
Below is an example, and I also attached an old script that will start all nodes (and thus the entire DB). I gave it to a client in the past to restart their cluster.
Ok, thanks I changed start policy to
Always — Node on a single node database is restarted automatically.
Now, the systemd start command works.
systemctl start verticad
But the service stays in inactive state, even if the database is actually up and running:
Raul
Service type is oneshot. So exec-command works only once when the service is starting.
Any idea how to change this so that it is not oneshot service..?
Raul
Ah, never mind.
vi /etc/systemd/system/verticad.service
I added:
And then:
systemctl daemon-reload
Now I can stop/start with systemctl
Raul
periodical retries
`# /etc/systemd/system/verticad.service
[Unit]
Description=Vertica server restart
[Service]
Type=oneshot
ExecStart=/opt/vertica/sbin/verticad start
SuccessExitStatus=1
KillMode=none`
`# /etc/systemd/system/verticad.timer
[Unit]
Description=Vertica server restart timer
[Timer]
OnUnitActiveSec=10min
OnBootSec=1min
[Install]
WantedBy=multi-user.target`