Options

HP OBR: Vertica Database WIll Not Start

I have a lab HP OBR server with a Vertica DB on Linux. In short, the Vertica DB will not start, which causes several HP OBR App Server processes not to start. I am seeing this in the dbLog:

 

cannot connect to /opt/vertica/config/local-spread.conf

 

FYI, this file does not exist in my Vertica DB server.

 

SP_connect: unable to connect via UNIX socket  to /tmp/4803 (pid=13528): Error: Connection Refused.

 

No idea what this last one means, but it looks pretty ugly.

 

Any help would be appreciated. Thanks!

 

~ Mike

Comments

  • Options
    SruthiASruthiA Vertica Employee Administrator

    HI,

     

          Did you upgrade vertica recently. Is this your first install

     

     

    Sruthi

  • Options

    First install, no upgrades. The OBR VM is Windows and the Vertica DB VM is Linux. These 2 VMs were powered down, but when brought back up, Vertica will not start. No recent changes of any kind.

  • Options

    Another hint is in vertica.log: during start up, over and over, I am getting this error:

     

    SP_connect: Error: Unable to connect via UNIX socket to /tmp/4803

     

    /tmp/4803 exists and /tmp is 1% full.

     

    Any ideas?

     

    UPDATE:

     

    It seems that this is a component called Spread that binds to port 4803 on Vertica start up. The file /tmp/4803 is there, but it is zero in size, and from what I've seen online, there is a bunch of stuff that gets written to /tmp that is necessary for spread/vertica to initialize. The problem here is that it's totally normal for Linux/UNIX servers to be set up to remove junk from /tmp on reboot. It seems it removed stuff that is critical for Spread to start. Online research indicates the only way to fix this is to re-compile, but I have no idea how to re-compile.

     

    Any feedback would be appreciated.

  • Options

    Hi,

     

    Presence of /tmp/4803 indicates that the user that starts Vertica can create this socket under /tmp.

     

    Please try the following

     

    1. Stop Vertica

    2. Remove /tmp/4803

    3. Start Vertica

     

    -Gayatri

  • Options

    Hi ,

    It’s may related to port conflicts , spread needs  port 4803 to be available , make sure this port its not occupied by other process

    Thanks

  • Options

    So, I did a netstat -anp yesterday, before creating this post about Vertica and this port is not listening. There does not seem to be a conflict.

     

    I tried the other suggestion. Vertica is not running, so I did not need to stop it. ;-)

     

    I changed the name of /tmp/4803 to /tmp/4803.01.26.2016 and started verticad. It still fails.

     

    Are there any logs that might give me a hint as to how to solve this?

     

    Online research indicates that this means Vertica needs to be re-compiled?

     

    How do you recompile Vertica?

  • Options

    Hi,

     

    How are you starting Vertica? What version of Vertica are you running? With version 7.x, spread.conf is under the database catalog directory. The earlier error indicates that it is trying to look for spread.conf under /opt/vertica/config.

     

    These are the logs to review for startup issues

     

    1. dbLog

    2. startup.log

    3. verticalog

     

    2 and 3 are in the catalog directory.

     

     

    Thanks

  • Options

    There is no Vertica Version in spread.conf, unless you are looking for this entry:

     

    ActiveIPVersion = IPv4

     

    This is in vertica.log:

     

    Starting up Vertica Analytic Database v7.1.2-0

     

    Logs zipped and attached.

     

    I believe this is a key entry in vertica.log ~ it's happening over and over in this log:

     

    2016-01-25 18:49:31.099 Init Session:0x7fa874012c70 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
    2016-01-25 18:49:31.099 Init Session:0x7fa874012c70 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
    2016-01-25 18:49:31.099 Init Session:0x7fa874012c70 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
    2016-01-25 18:49:31.099 Init Session:0x7fa874017550 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
    2016-01-25 18:49:31.099 Init Session:0x7fa8740139b0 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
    2016-01-25 18:49:31.099 Init Session:0x7fa874017bf0 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
    2016-01-25 18:49:31.099 Init Session:0x7fa874016170 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
    2016-01-25 18:49:31.099 Init Session:0x7fa8740139b0 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
    2016-01-25 18:49:31.099 Init Session:0x7fa874017550 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
    2016-01-25 18:49:31.100 Init Session:0x7fa874017bf0 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
    2016-01-25 18:49:31.100 Init Session:0x7fa874016170 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
    2016-01-25 18:49:31.100 Init Session:0x7fa874017550 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
    2016-01-25 18:49:31.100 Init Session:0x7fa8740139b0 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
    2016-01-25 18:49:31.100 Init Session:0x7fa874017bf0 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
    2016-01-25 18:49:31.100 Init Session:0x7fa874016170 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
    2016-01-25 18:49:31.107 Init Session:0x7fa874015430 <FATAL> @v_pmdb_node0001: {SessionRun} 57V03/4149: Node startup/recovery in progress. Not yet ready to accept connections
    LOCATION: initSession, /scratch_a/release/16125/vbuild/vertica/Session/ClientSession.cpp:461
    2016-01-25 18:49:31.107 Init Session:0x7fa874010f10 <FATAL> @v_pmdb_node0001: {SessionRun} 57V03/5785: Cluster Status Request by 10.250.250.22:38730
    HINT: Cluster State: pmdb
    INITIALIZING: 1 of 1 (v_pmdb_node0001)
    ----
    LOCATION: initSession, /scratch_a/release/16125/vbuild/vertica/Session/ClientSession.cpp:438
    2016-01-25 18:49:31.107 Init Session:0x7fa874016eb0 <FATAL> @v_pmdb_node0001: {SessionRun} 57V03/4149: Node startup/recovery in progress. Not yet ready to accept connections
    LOCATION: initSession, /scratch_a/release/16125/vbuild/vertica/Session/ClientSession.cpp:461

  • Options

    Thanks for the logs.

     

    I see that vertica starts spread but the spread daemon terminates

     

    2016-01-26 14:41:47.863 unknown:0x7f371a7ad700 [Comms] <INFO> About to launch spread with '/opt/vertica/spread/sbin/spread -c /var/opt/OV/pmdb/v_pmdb_node0001_catalog/spread.conf'
    2016-01-26 14:41:47.873 unknown:0x7f371a7ad700 [Comms] <INFO> forked spread pid=18767, wrote pidfile /var/opt/OV/pmdb/v_pmdb_node0001_catalog/spread.pid
    2016-01-26 14:41:47.873 unknown:0x7f371a7ad700 [Init] <INFO> Listening on port: 5433
    2016-01-26 14:41:47.873 unknown:0x7f371a7ad700 [Init] <INFO> About to fork
    2016-01-26 14:41:47.874 unknown:0x7f371a7ad700 [Init] <INFO> About to fork again
    2016-01-26 14:41:47.878 unknown:0x7f371a7ad700 [Init] <INFO> Completed forking
    2016-01-26 14:41:47.881 unknown:0x7f371a7ad700 [Init] <INFO> Startup [Connecting to Spread] Connecting to spread 4803
    2016-01-26 14:42:17.915 unknown:0x7f371a7ad700 [Init] <INFO> Spread daemon does not appear to be running on 10.250.250.22 -- exiting!

     

    Can you check what is terminating spread - perhaps /var/log/messages has something ?

     

    Thanks

  • Options

    I did fix that. When I tried deleting /tmp/4803 and restarting before, I was acting as root.

     

    As soon as I removed that file and started vertica with the db user, spread started.

     

    The spread.log contains:

     

    Spread: initialization complete, pid=12345, entering main event loop.

     

    Then, it eventually fails:

     

    Daemon idle, exiting.

     

    Exit caused by Alarm (EXIT)

  • Options

    Hi 

    Share the content of the dblog file  from your system

    Thanks

  • Options

    New vertica.log attached. This is AFTER I fixed the spread start up.

  • Options

    As requested, dbLog is attached, a brand new copy.

  • Options

    Hi,
    You have problem with spread process  not able to communicate between nodes
    send the content of spread.conf file

  • Options

    I only have a one node Vertica installation FYI.

     

    UPDATE:

     

    I have these two spread.conf files on my Vertica DB server:

     

    /opt/vertica/spread/daemon/spread.conf


    /var/opt/OV/pmdb/v_pmdb_node0001_catalog/spread.conf

     

    I was referencing the latter, and here is what's in it:

     

    # cat /var/opt/OV/pmdb/v_pmdb_node0001_catalog/spread.conf
    # 1
    # Auto-generated by vertica - do not edit
    ActiveIPVersion = IPv4
    Spread_Segment 10.250.251.255:4803 {
    N010250250022 <IP Address of Vertica DB Server> {
    <IP Address of Vertica DB Server>
    }
    }

    # begin end matter
    ##EventLogFile = /dev/null
    EventLogFile = /var/log/spread.log
    EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"
    DebugFlags = { PRINT EXIT }
    ExitOnIdle = yes

     

    I have no idea what 10.250.251.255 is.

  • Options

    Hi,

     

    10.250.251.255 looks like the broadcast address of the vertica node.

     

    can you check if UDP traffic is blocked on teh vertica node? Spread communication is over UDP.

     

    You can you reconfigure spread for pt2pt communication and see if that helps :

     

    Doc Reference :

    http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/InstallationGuide/InstallingVertica/RunTheInstallScript.htm

     

    run install_vertica

    Use the same options as you did the first time and add  : -T -S default

     

    Thanks

  • Options
    Iptables is totally down and SELINUX is disabled. Again, before this server was powered down, OBR and Vertica worked perfectly.

    I tried restoring to the last known epoch that worked and this also failed.

    Other than reinstalling from scratch, reverting to a VM snapshot previous to the power down, etc. any ideas as to how to fix the current issue?

    Was Vertica developed under the assumption that the server it was deployed on would never be restarted? ;-)
  • Options
    Also, is there a way to put Vertica and Spread logging into verbose debug to get hints as to what the cause of this issue is?
  • Options

    Hi ,

    I assumes your real spred_segment part looks like below :

     

    Spread_Segment 10.250.251.255:4803 {
    N010250250022 10.250.250.22  {
    10.250.250.22  
    }
    }

     

    To get bigger picture , send your admintools.conf file 

     

    Thanks 

  • Options

    admintools.conf:

     

    # cat /opt/vertica/config/admintools.conf


    [Configuration]
    last_port = 5433
    tmp_dir = /tmp
    default_base = /home/<User Name>
    format = 3
    install_opts = --hosts '<IP Address of Vertica DB Server>' --dba-user <user name> --dba-user-password '*******' --dba-group <Group Name> --failure-threshold NONE --accept-eula --license '/opt/HP/BSM//PMDB/config/license/00003169_ITOM_SaaS_100TB.dat'
    spreadlog = False
    controlsubnet = default
    controlmode = broadcast

    [Cluster]
    hosts = <IP Address of Vertica DB Server>

    [Nodes]
    node0001 = <IP Address of Vertica DB Server>,/home/<User Name>,/home/<User Name>
    v_pmdb_node0001 = <IP Address of Vertica DB Server>,/var/opt/OV,/var/opt/OV

    [Database:pmdb]
    restartpolicy = always
    port = 5433
    path = /var/opt/OV/pmdb/v_pmdb_node0001_catalog
    nodes = v_pmdb_node0001

  • Options
    SruthiASruthiA Vertica Employee Administrator

    HI,

     

       After fixing the above issues, What is the error message you are getting when you try to restart vertica??

     

     

    Sruthi

  • Options

    When I execute:

     

    # service verticad start

     

    It churns for a while, and comes up [Failed] all in red.

     

    I did provide all of the logs, which are attached to several of the messages in this thread.

     

    The spread.log contains:

     

    Spread: initialization complete, pid=12345, entering main event loop.

     

    Then, it eventually fails:

     

    Daemon idle, exiting.

     

    Exit caused by Alarm (EXIT)

     

    This is one example. I've attached the logs again to this message, for your convenience.

     

    OBR and Vertica are still hard down.

     

    If this is not solved very soon, I will just re-image these VMs and start from scratch.

     

    However, it's ridiculous that a known issue with Vertica is that it can and will become corrupted if you reboot a server.

  • Options

    Hi

    Send the content of your opt/vertica/log and more specifc the admintools* files 

  • Options

    Requested logs are attached. Thanks for any help you might provide.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file