systemd unit verticad.service fails to start local vertica instance on boot - fix
This is about the issue where verticad.service fails to start the local vertica instance on host boot. This is a race condition where vertica.service is allowed to start before network.target.
To elaborate:
1. verticad.service is a wrapper for /opt/vertica/sbin/verticad
2. verticad calls /opt/vertica/bin/admintools as the local admin user
3. admintools establishes the local node's identity from a local ip address (AFAICT)
and admintools will exit if no network ip is available (yet).
For example, using admintools with no local ip address (don't try this without console access):
# systemctl stop network
# ip a
1: lo: mtu 65536 qdisc noqueue state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0@if13: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether 00:18:5e:c2:d6:0b brd ff:ff:ff:ff:ff:ff link-netnsid 0
# su - dbadmin
$ /opt/vertica/bin/admintools -t command_host -c start
Considering command 'start' for database VMart
This host not in database 'VMart', moving on
Which is the error seen in /opt/vertica/log/verticad.log
You can confirm verticad.service is starting before network.target with systemd-analyze critical-chain verticad.service network.target
To fix, have verticad.service wait for network.target. To add the dependency without modifying the vertica installation:systemctl edit verticad.service
and add:
[Unit]
After=network.target
or the same but as an ansible snippet:
- file:
path: "/etc/systemd/system/verticad.service.d"
state: "directory"
owner: "root"
group: "root"
mode: "u=rwx,g=rx,o=rx"
- ini_file:
path: "/etc/systemd/system/verticad.service.d/override.conf"
create: yes
section: "Unit"
option: "After"
value: "network.target"
owner: "root"
group: "root"
mode: "u=rw,g=r,o=r"
- name: systemctl daemon-reload
systemd: daemon_reload=yes
HTH
PS. I've opened SD02791557 if anyone wants to nudge it along.