autostart database

aleksss55aleksss55 Community Edition User

Hello, I have an installed vertica cluster with 3 hosts. I want to do autostart database when starting hosts (when all hosts were shutdown and now I turn on them). At every host in /etc/init.d/ I added script myscript.sh:
sudo -u myuser /opt/vertica/bin/admintools -t start_db -d test

When I run this script manually, it works and database is starting. But when the script is launched at OS startup, the database does not start. In the adminTools.log log I don't see startup errors, I see only pexpecting vsql command... and All nodes in db test are in state DOWN

2020-06-08 16:42:08.526 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n  -p 5433 -U myuser -h 192.168.0.5 test -P pager -A
2020-06-08 16:42:08.984 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n  -p 5433 -U myuser -h 192.168.0.6 test -P pager -A
2020-06-08 16:42:09.459 agent/752:0x7f195dffb700 [vsql._just_connect] <INFO> pexpecting vsql command: /opt/vertica/bin/vsql --no-vsqlrc -n  -p 5433 -U myuser -h 192.168.0.7 test -P pager -A
2020-06-08 16:43:05.639 admintools/3701:0x7f456298c740 [adminExec.getCollapsedClusterState] <INFO> All nodes in db test are in state DOWN

Why is that?

Tagged:

Answers

  • Bryan_HBryan_H Vertica Employee Administrator

    It's recommended to start from only one node rather than adding to each node. Also, it's probably best to start using systemd rather than admintools directly - there should be a script /etc/init.d/verticad installed for this, if not then check in /opt/vertica/sbin for init.d and systemd scripts.

  • aleksss55aleksss55 Community Edition User

    I left the script in /etc/init.d/ only at one host but the problem persisted

  • Bryan_HBryan_H Vertica Employee Administrator

    If you run /etc/init.d/verticad start, does it work?
    If so, then use the command in that file, which is:
    su -l "$user" -c "${vertica_prefix}/bin/adminTools -t command_host -c $1"

  • aleksss55aleksss55 Community Edition User

    If you run /etc/init.d/verticad start, does it work?

    No, service verticad doesn't work
    I run systemctl start verticad
    Then I run: systemctl status verticad
    verticad.service - Vertica server restart oneshot
    Loaded: loaded (/etc/systemd/system/verticad.service; enabled; vendor preset: disabled)
    Active: inactive (dead) since Thu 2020-06-11 07:19:43 MSK; 38min ago
    Process: 650 ExecStart=/opt/vertica/sbin/verticad start (code=exited, status=0/SUCCESS)
    Main PID: 650 (code=exited, status=0/SUCCESS)

        Jun 11 07:19:09 verticaserv1 systemd[1]: Starting Vertica server restart oneshot...
        Jun 11 07:19:13 verticaserv1 su[706]: (to mydba) root on none
        Jun 11 07:19:43 verticaserv1 verticad[650]: Vertica: start OK for users: mydba
        Jun 11 07:19:43 verticaserv1 verticad[650]: [  OK  ]
        Jun 11 07:19:43 verticaserv1 systemd[1]: Started Vertica server restart oneshot.
    

    In logs (/opt/vertica/log/verticad.log and /var/log/messages) I see only:
    vertica process is not running
    Vertica: start not OK

  • Bryan_HBryan_H Vertica Employee Administrator

    Two other places to check: /opt/vertica/log/admintools.log, since all startups wrap admintools, and if nothing there, or to verify what admintools logs show, check the catalog directory for "startup.log" which will show more details.
    Usually when I run into this, it means one node didn't shut down cleanly, and it is necessary to roll back failed transactions with admintools -t restart_db with additional recovery options.
    Always be sure to shut down a EE mode (local data and catalog) database before shutting down the hosts to avoid issues where Vertica has not finished commit to disk.

  • avkirilishinavkirilishin Vertica Customer

    Why is this service oneshot? I want to restart the down node automatically. And I couldn't find any reason why verticad can't do it periodically on each node: adminTools implements the (re)start policy properly with "command_host".

  • avkirilishinavkirilishin Vertica Customer

    The reason for the problem above (non-working verticad): "oneshot" service stops after exec "/opt/vertica/sbin/verticad start". But KillMode defaults to 'control-group'. That means every process of this service is killed with SIGTERM.

    Thus I found in vertica.log:
    Spread Mailbox Dequeue:0x7f9b84ff9700 [Comms] <WARNING> error SP_receive: Connection closed by spread Spread Mailbox Dequeue:0x7f9b84ff9700 [Comms] <WARNING> error SP_receive: The network socket experienced an error. This Spread mailbox will no longer work until the connection is disconnected and then reconnected
    and in spread.log:
    *** spread pid=*** received termination signal 15

    One possible solution is KillMode=process (or none) in [Service] section of verticad.service

  • raulk89raulk89 Community Edition User

    I apologise for stealing others topic :) , but I am struggling with the same issue on CentOS 7. This verticad systemd service does not work.
    Using vertica 10.0.0 rpm.
    systemctl start verticad
    verticad log file has following then:

    Considering command 'start' for database mydb
    failed to fetch one upnode from DBState
    ksafe but DB not up, skipping

    While checking admintools.log file, there is this:

    [root.setup_custom_logging] New log for 'admintools'
    [root.setup_custom_logging] sys.argv: '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/tools/ATMain.py' -t command_host -cstart
    [main.main]

    New invocation of AdminTools at ATMain.py:main()

    [Configurator.init] MAX_ARG_STRLEN is set to: 131072
    [SshAuth.setup_from_admintools_conf_if_needed] Setup complete, default ssh user is dbadmin, default ssh options are -oConnectTimeout=30 -o TCPKeepAlive=no -o ServerAliveInterval=15 -o ServerAliveCountMax=2 -o StrictHostKeyChecking=no -o BatchMode=yes.
    [environment_manager.set_env_vars] Dangerous environment variable MAILCHECK is set from "" to "0"
    [environment_manager.set_env_vars] Dangerous environment variable MAIL is set from "/var/spool/mail/dbadmin" to ""
    [adminExec._is_current_user_conf_owner] Owner of admintools.conf (dbauser) is dbadmin.
    [adminExec._is_current_user_conf_owner] Owner of admintools.conf (dbauser) is dbadmin.
    [root.setup_custom_logging] New log for 'admintools'
    [root.setup_custom_logging] sys.argv: '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/license_check.py'
    [commandLineCtrl.commandHost] command_host called with arguments ['commandHost', '-cstart']
    [commandLineCtrl.commandHost] processing start
    [commandLineCtrl.commandHost] PROFILING_MARKER: begin|command_host
    [commandLineCtrl.commandHost] Considering command 'start' for database mydb
    [DatabaseState.get_one_up_node] No up node in DB
    [commandLineCtrl._dispatch_command_host] failed to fetch one upnode from DBState
    [adminExec.getRestartPolicy] found restartpolicy dict
    [commandLineCtrl._dispatch_command_host] executing start for DB mydb (policy: ksafe); host 10.51.2.88 node v_mydb_node0001
    [commandLineCtrl._dispatch_command_host] ksafe but DB not up, skipping
    [commandLineCtrl.commandHost] command rc = 0
    [commandLineCtrl.commandHost] PROFILING_MARKER: end|command_host

    And when I execute this same statement from dbadmin user:
    [dbadmin@db-simpl ~]$ '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/tools/ATMain.py' -t command_host -cstart

    Considering command 'start' for database mydb
    failed to fetch one upnode from DBState
    ksafe but DB not up, skipping

    But database is not started.

    Raul

  • raulk89raulk89 Community Edition User

    I apologise for stealing others topic :), but I am having the exact same issue on CentOS 7, this verticad systemd service just does not work.
    Using vertica 10.0.0 rpm

    When issuing:
    systemctl start verticad
    Then in log file, there is following entries created (/opt/vertica/log/verticad.log)

    Considering command 'start' for database mydb
    failed to fetch one upnode from DBState
    ksafe but DB not up, skipping

    And in admintools log there is the following entries created:
    /opt/vertica/log/adminTools.log

    [root.setup_custom_logging] New log for 'admintools'
    [root.setup_custom_logging] sys.argv: '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/tools/ATMain.py' -t command_host -cstart
    [main.main]

    New invocation of AdminTools at ATMain.py:main()

    [Configurator.init] MAX_ARG_STRLEN is set to: 131072
    [SshAuth.setup_from_admintools_conf_if_needed] Setup complete, default ssh user is dbadmin, default ssh options are -oConnectTimeout=30 -o TCPKeepAlive=no -o ServerAliveInterval=15 -o ServerAliveCountMax=2 -o StrictHostKeyChecking=no -o BatchMode=yes.
    [environment_manager.set_env_vars] Dangerous environment variable MAILCHECK is set from "" to "0"
    [environment_manager.set_env_vars] Dangerous environment variable MAIL is set from "/var/spool/mail/dbadmin" to ""
    [adminExec._is_current_user_conf_owner] Owner of admintools.conf (dbauser) is dbadmin.
    [adminExec._is_current_user_conf_owner] Owner of admintools.conf (dbauser) is dbadmin.
    [root.setup_custom_logging] New log for 'admintools'
    [root.setup_custom_logging] sys.argv: '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/license_check.py'
    [commandLineCtrl.commandHost] command_host called with arguments ['commandHost', '-cstart']
    [commandLineCtrl.commandHost] processing start
    [commandLineCtrl.commandHost] PROFILING_MARKER: begin|command_host
    [commandLineCtrl.commandHost] Considering command 'start' for database mydb
    [DatabaseState.get_one_up_node] No up node in DB
    [commandLineCtrl._dispatch_command_host] failed to fetch one upnode from DBState
    [adminExec.getRestartPolicy] found restartpolicy dict
    [commandLineCtrl._dispatch_command_host] executing start for DB mydb (policy: ksafe); host 10.51.2.88 node v_mydb_node0001
    [commandLineCtrl._dispatch_command_host] ksafe but DB not up, skipping
    [commandLineCtrl.commandHost] command rc = 0
    [commandLineCtrl.commandHost] PROFILING_MARKER: end|command_host

    If I issue this same statement for dbadmin user:
    '/opt/vertica/oss/python3/lib/python3.7/site-packages/vertica/tools/ATMain.py' -t command_host -cstart
    I get same errors:

    Considering command 'start' for database mydb
    failed to fetch one upnode from DBState
    ksafe but DB not up, skipping

    And database is not started up.

    Raul

  • avkirilishinavkirilishin Vertica Customer

    Raul, this feature does not automatically restart nodes if the entire database is DOWN:
    https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AdministratorsGuide/AdminTools/SettingTheRestartPolicy.htm

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    You should be able to start the Vertica nodes manually using /opt/vertica/bin/vertica.

    Below is an example, and I also attached an old script that will start all nodes (and thus the entire DB). I gave it to a client in the past to restart their cluster.

    [dbadmin@SE-Sandbox-26-node2 ~]$ vsql -c "SELECT node_name, node_address, node_state FROM nodes;"
         node_name      | node_address  | node_state
    --------------------+---------------+------------
     v_test_db_node0001 | 172.16.61.176 | UP
     v_test_db_node0002 | 172.16.61.177 | UP
     v_test_db_node0003 | 172.16.61.178 | UP
    (3 rows)
    
    [dbadmin@SE-Sandbox-26-node2 ~]$ ssh 172.16.61.178 "ps -ef | grep \"[v]ertica -D\""
    dbadmin  410150      1  1 Aug19 ?        01:10:58 /opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0003_catalog -C test_db -n v_test_db_node0003 -h 172.16.61.178 -p 5433 -P 4803 -Y ipv4 -e -S 271
    
    [dbadmin@SE-Sandbox-26-node2 ~]$ ssh 172.16.61.178 "kill -9 410150"
    
    [dbadmin@SE-Sandbox-26-node2 ~]$ vsql -c "SELECT node_name, node_address, node_state FROM nodes;"
         node_name      | node_address  | node_state
    --------------------+---------------+------------
     v_test_db_node0001 | 172.16.61.176 | UP
     v_test_db_node0002 | 172.16.61.177 | UP
     v_test_db_node0003 | 172.16.61.178 | DOWN
    (3 rows)
    
    [dbadmin@SE-Sandbox-26-node2 ~]$ ssh 172.16.61.178 "/opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0003_catalog -C test_db -n v_test_db_node0003 -h 172.16.61.178 -p 5433 -P 4803 -Y ipv4 -e -S 271"
    Connecting to spread at 4803
    /===========================================================================\
    | The Spread Toolkit                                                        |
    | Copyright (c) 1993-2016 Spread Concepts LLC                               |
    | All rights reserved.                                                      |
    |                                                                           |
    | The Spread toolkit is licensed under the Spread Open-Source License.      |
    | You may only use this software in compliance with the License.            |
    | A copy of the license can be found at http://www.spread.org/license       |
    |                                                                           |
    | This product uses software developed by Spread Concepts LLC for use       |
    | in the Spread toolkit. For more information about Spread,                 |
    | see http://www.spread.org                                                 |
    |                                                                           |
    | This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF     |
    | ANY KIND, either express or implied.                                      |
    |                                                                           |
    | Creators:                                                                 |
    |    Yair Amir             yairamir@cs.jhu.edu                              |
    |    Michal Miskin-Amir    michal@spreadconcepts.com                        |
    |    Jonathan Stanton      jonathan@spreadconcepts.com                      |
    |    John Schultz          jschultz@spreadconcepts.com                      |
    |                                                                           |
    | Major Contributors:                                                       |
    |    Amy Babay            babay@cs.jhu.edu - accelerated ring protocol.     |
    |    Ryan Caudy           rcaudy@gmail.com - contribution to process groups.|
    |    Claudiu Danilov      claudiu@acm.org - scalable, wide-area support.    |
    |    Cristina Nita-Rotaru crisn@cs.purdue.edu - GC security.                |
    |    Theo Schlossnagle    jesus@omniti.com - Perl, autoconf, old skiplist.  |
    |    Dan Schoenblum       dansch@cnds.jhu.edu - Java interface.             |
    |                                                                           |
    | Special thanks to the following for discussions and ideas:                |
    |    Ken Birman, Danny Dolev, Jacob Green, Mike Goodrich, Ben Laurie,       |
    |    David Shaw, Gene Tsudik, Robbert VanRenesse.                           |
    |                                                                           |
    | Partial funding provided by the Defense Advanced Research Project Agency  |
    | (DARPA) and the National Security Agency (NSA) 2000-2004. The Spread      |
    | toolkit is not necessarily endorsed by DARPA or the NSA.                  |
    |                                                                           |
    | For a full list of contributors, see Readme.txt in the distribution.      |
    |                                                                           |
    | WWW:     www.spread.org     www.spreadconcepts.com                        |
    | Contact: info@spreadconcepts.com                                          |
    |                                                                           |
    | Version 5.00.03 Built 22/Feb/2020                                         |
    \===========================================================================/
    Conf_load_conf_file: using file: /home/dbadmin/test_db/v_test_db_node0003_catalog/spread.conf
    Conf_load_conf_file: vertica version is 1845
    Setting active IP version to 2
    Configured daemon 'N172016061176' with IP '172.16.61.176'
    Auto-generated virtual ID = '2956791980' for daemon 'N172016061176'
    Daemon 'N172016061176' will have virtual ID = '2956791980'
    Successfully configured Segment 0 [172.16.61.176]:4803 with 1 procs:
                   N172016061176: 172.16.61.176
    Configured daemon 'N172016061177' with IP '172.16.61.177'
    Auto-generated virtual ID = '2973569196' for daemon 'N172016061177'
    Daemon 'N172016061177' will have virtual ID = '2973569196'
    Successfully configured Segment 1 [172.16.61.177]:4803 with 1 procs:
                   N172016061177: 172.16.61.177
    Configured daemon 'N172016061178' with IP '172.16.61.178'
    Auto-generated virtual ID = '2990346412' for daemon 'N172016061178'
    Daemon 'N172016061178' will have virtual ID = '2990346412'
    Successfully configured Segment 2 [172.16.61.178]:4803 with 1 procs:
                   N172016061178: 172.16.61.178
    Starting UDxSideProcess for language C++
       with command line:  /opt/vertica/bin/vertica-udx-C++ 3 v_test_db_node0003-523725:0x2 debug-log-off /home/dbadmin/test_db/v_test_db_node0003_catalog/UDxLogs 4
    
    Starting UDxSideProcess for language C++
       with command line:  /opt/vertica/bin/vertica-udx-C++ 3 v_test_db_node0003-523725:0xd debug-log-off /home/dbadmin/test_db/v_test_db_node0003_catalog/UDxLogs 4
    
    
    ^CKilled by signal 2.
    
    [dbadmin@SE-Sandbox-26-node2 ~]$ vsql -c "SELECT node_name, node_address, node_state FROM nodes;"
         node_name      | node_address  | node_state
    --------------------+---------------+------------
     v_test_db_node0001 | 172.16.61.176 | UP
     v_test_db_node0002 | 172.16.61.177 | UP
     v_test_db_node0003 | 172.16.61.178 | UP
    (3 rows)
    
    
    
  • raulk89raulk89 Community Edition User

    Ok, thanks I changed start policy to
    Always — Node on a single node database is restarted automatically.
    Now, the systemd start command works.
    systemctl start verticad

    But the service stays in inactive state, even if the database is actually up and running:

    [root@vhost ~]# /opt/vertica/sbin/verticad status
    Vertica: status OK for users: dbadmin

    [root@vhost ~]# systemctl status verticad
    ● verticad.service - Vertica server restart oneshot
    Loaded: loaded (/etc/systemd/system/verticad.service; enabled; vendor preset: disabled)
    Active: inactive (dead) since Mon 2020-08-24 01:20:01 EEST; 20s ago
    Process: 106214 ExecStart=/opt/vertica/sbin/verticad start (code=exited, status=0/SUCCESS)
    Main PID: 106214 (code=exited, status=0/SUCCESS)

    Aug 24 01:19:34 jaak-db-ibm.just.sise systemd[1]: Starting Vertica server restart oneshot...
    Aug 24 01:19:35 jaak-db-ibm.just.sise su[106221]: (to dbadmin) root on none
    Aug 24 01:20:01 jaak-db-ibm.just.sise verticad[106214]: Vertica: start OK for users: dbadmin
    Aug 24 01:20:01 jaak-db-ibm.just.sise verticad[106214]: [ OK ]
    Aug 24 01:20:01 jaak-db-ibm.just.sise systemd[1]: Started Vertica server restart oneshot.

    Raul

  • avkirilishinavkirilishin Vertica Customer
    edited August 2020

    Service type is oneshot. So exec-command works only once when the service is starting.

  • raulk89raulk89 Community Edition User

    Any idea how to change this so that it is not oneshot service..?

    Raul

  • raulk89raulk89 Community Edition User

    Ah, never mind.
    vi /etc/systemd/system/verticad.service
    I added:

    ExecStop=/opt/vertica/sbin/verticad stop
    RemainAfterExit=yes
    

    And then:
    systemctl daemon-reload

    Now I can stop/start with systemctl

    Raul

  • avkirilishinavkirilishin Vertica Customer
    edited August 2020

    periodical retries

    `# /etc/systemd/system/verticad.service
    [Unit]
    Description=Vertica server restart

    [Service]
    Type=oneshot
    ExecStart=/opt/vertica/sbin/verticad start
    SuccessExitStatus=1
    KillMode=none`

    `# /etc/systemd/system/verticad.timer
    [Unit]
    Description=Vertica server restart timer

    [Timer]
    OnUnitActiveSec=10min
    OnBootSec=1min

    [Install]
    WantedBy=multi-user.target`

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file