HP OBR: Vertica Database WIll Not Start
I have a lab HP OBR server with a Vertica DB on Linux. In short, the Vertica DB will not start, which causes several HP OBR App Server processes not to start. I am seeing this in the dbLog:
cannot connect to /opt/vertica/config/local-spread.conf
FYI, this file does not exist in my Vertica DB server.
SP_connect: unable to connect via UNIX socket to /tmp/4803 (pid=13528): Error: Connection Refused.
No idea what this last one means, but it looks pretty ugly.
Any help would be appreciated. Thanks!
~ Mike
0
Comments
HI,
Did you upgrade vertica recently. Is this your first install
Sruthi
First install, no upgrades. The OBR VM is Windows and the Vertica DB VM is Linux. These 2 VMs were powered down, but when brought back up, Vertica will not start. No recent changes of any kind.
Another hint is in vertica.log: during start up, over and over, I am getting this error:
SP_connect: Error: Unable to connect via UNIX socket to /tmp/4803
/tmp/4803 exists and /tmp is 1% full.
Any ideas?
UPDATE:
It seems that this is a component called Spread that binds to port 4803 on Vertica start up. The file /tmp/4803 is there, but it is zero in size, and from what I've seen online, there is a bunch of stuff that gets written to /tmp that is necessary for spread/vertica to initialize. The problem here is that it's totally normal for Linux/UNIX servers to be set up to remove junk from /tmp on reboot. It seems it removed stuff that is critical for Spread to start. Online research indicates the only way to fix this is to re-compile, but I have no idea how to re-compile.
Any feedback would be appreciated.
Hi,
Presence of /tmp/4803 indicates that the user that starts Vertica can create this socket under /tmp.
Please try the following
1. Stop Vertica
2. Remove /tmp/4803
3. Start Vertica
-Gayatri
Hi ,
It’s may related to port conflicts , spread needs port 4803 to be available , make sure this port its not occupied by other process
Thanks
So, I did a netstat -anp yesterday, before creating this post about Vertica and this port is not listening. There does not seem to be a conflict.
I tried the other suggestion. Vertica is not running, so I did not need to stop it. ;-)
I changed the name of /tmp/4803 to /tmp/4803.01.26.2016 and started verticad. It still fails.
Are there any logs that might give me a hint as to how to solve this?
Online research indicates that this means Vertica needs to be re-compiled?
How do you recompile Vertica?
Hi,
How are you starting Vertica? What version of Vertica are you running? With version 7.x, spread.conf is under the database catalog directory. The earlier error indicates that it is trying to look for spread.conf under /opt/vertica/config.
These are the logs to review for startup issues
1. dbLog
2. startup.log
3. verticalog
2 and 3 are in the catalog directory.
Thanks
There is no Vertica Version in spread.conf, unless you are looking for this entry:
ActiveIPVersion = IPv4
This is in vertica.log:
Starting up Vertica Analytic Database v7.1.2-0
Logs zipped and attached.
I believe this is a key entry in vertica.log ~ it's happening over and over in this log:
2016-01-25 18:49:31.099 Init Session:0x7fa874012c70 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
2016-01-25 18:49:31.099 Init Session:0x7fa874012c70 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
2016-01-25 18:49:31.099 Init Session:0x7fa874012c70 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
2016-01-25 18:49:31.099 Init Session:0x7fa874017550 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
2016-01-25 18:49:31.099 Init Session:0x7fa8740139b0 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
2016-01-25 18:49:31.099 Init Session:0x7fa874017bf0 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
2016-01-25 18:49:31.099 Init Session:0x7fa874016170 <LOG> @v_pmdb_node0001: 00000/6014: Running load balance policy: roundrobin
2016-01-25 18:49:31.099 Init Session:0x7fa8740139b0 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
2016-01-25 18:49:31.099 Init Session:0x7fa874017550 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
2016-01-25 18:49:31.100 Init Session:0x7fa874017bf0 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
2016-01-25 18:49:31.100 Init Session:0x7fa874016170 <LOG> @v_pmdb_node0001: 00000/6033: Suggested load balance target node is :0
2016-01-25 18:49:31.100 Init Session:0x7fa874017550 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
2016-01-25 18:49:31.100 Init Session:0x7fa8740139b0 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
2016-01-25 18:49:31.100 Init Session:0x7fa874017bf0 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
2016-01-25 18:49:31.100 Init Session:0x7fa874016170 <LOG> @v_pmdb_node0001: 00000/5789: Connection load balance request refused by server
2016-01-25 18:49:31.107 Init Session:0x7fa874015430 <FATAL> @v_pmdb_node0001: {SessionRun} 57V03/4149: Node startup/recovery in progress. Not yet ready to accept connections
LOCATION: initSession, /scratch_a/release/16125/vbuild/vertica/Session/ClientSession.cpp:461
2016-01-25 18:49:31.107 Init Session:0x7fa874010f10 <FATAL> @v_pmdb_node0001: {SessionRun} 57V03/5785: Cluster Status Request by 10.250.250.22:38730
HINT: Cluster State: pmdb
INITIALIZING: 1 of 1 (v_pmdb_node0001)
----
LOCATION: initSession, /scratch_a/release/16125/vbuild/vertica/Session/ClientSession.cpp:438
2016-01-25 18:49:31.107 Init Session:0x7fa874016eb0 <FATAL> @v_pmdb_node0001: {SessionRun} 57V03/4149: Node startup/recovery in progress. Not yet ready to accept connections
LOCATION: initSession, /scratch_a/release/16125/vbuild/vertica/Session/ClientSession.cpp:461
Thanks for the logs.
I see that vertica starts spread but the spread daemon terminates
2016-01-26 14:41:47.863 unknown:0x7f371a7ad700 [Comms] <INFO> About to launch spread with '/opt/vertica/spread/sbin/spread -c /var/opt/OV/pmdb/v_pmdb_node0001_catalog/spread.conf'
2016-01-26 14:41:47.873 unknown:0x7f371a7ad700 [Comms] <INFO> forked spread pid=18767, wrote pidfile /var/opt/OV/pmdb/v_pmdb_node0001_catalog/spread.pid
2016-01-26 14:41:47.873 unknown:0x7f371a7ad700 [Init] <INFO> Listening on port: 5433
2016-01-26 14:41:47.873 unknown:0x7f371a7ad700 [Init] <INFO> About to fork
2016-01-26 14:41:47.874 unknown:0x7f371a7ad700 [Init] <INFO> About to fork again
2016-01-26 14:41:47.878 unknown:0x7f371a7ad700 [Init] <INFO> Completed forking
2016-01-26 14:41:47.881 unknown:0x7f371a7ad700 [Init] <INFO> Startup [Connecting to Spread] Connecting to spread 4803
2016-01-26 14:42:17.915 unknown:0x7f371a7ad700 [Init] <INFO> Spread daemon does not appear to be running on 10.250.250.22 -- exiting!
Can you check what is terminating spread - perhaps /var/log/messages has something ?
Thanks
I did fix that. When I tried deleting /tmp/4803 and restarting before, I was acting as root.
As soon as I removed that file and started vertica with the db user, spread started.
The spread.log contains:
Spread: initialization complete, pid=12345, entering main event loop.
Then, it eventually fails:
Daemon idle, exiting.
Exit caused by Alarm (EXIT)
Hi
Share the content of the dblog file from your system
Thanks
New vertica.log attached. This is AFTER I fixed the spread start up.
As requested, dbLog is attached, a brand new copy.
Hi,
You have problem with spread process not able to communicate between nodes
send the content of spread.conf file
I only have a one node Vertica installation FYI.
UPDATE:
I have these two spread.conf files on my Vertica DB server:
/opt/vertica/spread/daemon/spread.conf
/var/opt/OV/pmdb/v_pmdb_node0001_catalog/spread.conf
I was referencing the latter, and here is what's in it:
# cat /var/opt/OV/pmdb/v_pmdb_node0001_catalog/spread.conf
# 1
# Auto-generated by vertica - do not edit
ActiveIPVersion = IPv4
Spread_Segment 10.250.251.255:4803 {
N010250250022 <IP Address of Vertica DB Server> {
<IP Address of Vertica DB Server>
}
}
# begin end matter
##EventLogFile = /dev/null
EventLogFile = /var/log/spread.log
EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"
DebugFlags = { PRINT EXIT }
ExitOnIdle = yes
I have no idea what 10.250.251.255 is.
Hi,
10.250.251.255 looks like the broadcast address of the vertica node.
can you check if UDP traffic is blocked on teh vertica node? Spread communication is over UDP.
You can you reconfigure spread for pt2pt communication and see if that helps :
Doc Reference :
http://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/InstallationGuide/InstallingVertica/RunTheInstallScript.htm
run install_vertica
Use the same options as you did the first time and add : -T -S default
Thanks
I tried restoring to the last known epoch that worked and this also failed.
Other than reinstalling from scratch, reverting to a VM snapshot previous to the power down, etc. any ideas as to how to fix the current issue?
Was Vertica developed under the assumption that the server it was deployed on would never be restarted? ;-)
Hi ,
I assumes your real spred_segment part looks like below :
Spread_Segment 10.250.251.255:4803 {
N010250250022 10.250.250.22 {
10.250.250.22
}
}
To get bigger picture , send your admintools.conf file
Thanks
admintools.conf:
# cat /opt/vertica/config/admintools.conf
[Configuration]
last_port = 5433
tmp_dir = /tmp
default_base = /home/<User Name>
format = 3
install_opts = --hosts '<IP Address of Vertica DB Server>' --dba-user <user name> --dba-user-password '*******' --dba-group <Group Name> --failure-threshold NONE --accept-eula --license '/opt/HP/BSM//PMDB/config/license/00003169_ITOM_SaaS_100TB.dat'
spreadlog = False
controlsubnet = default
controlmode = broadcast
[Cluster]
hosts = <IP Address of Vertica DB Server>
[Nodes]
node0001 = <IP Address of Vertica DB Server>,/home/<User Name>,/home/<User Name>
v_pmdb_node0001 = <IP Address of Vertica DB Server>,/var/opt/OV,/var/opt/OV
[Database:pmdb]
restartpolicy = always
port = 5433
path = /var/opt/OV/pmdb/v_pmdb_node0001_catalog
nodes = v_pmdb_node0001
HI,
After fixing the above issues, What is the error message you are getting when you try to restart vertica??
Sruthi
When I execute:
# service verticad start
It churns for a while, and comes up [Failed] all in red.
I did provide all of the logs, which are attached to several of the messages in this thread.
The spread.log contains:
Spread: initialization complete, pid=12345, entering main event loop.
Then, it eventually fails:
Daemon idle, exiting.
Exit caused by Alarm (EXIT)
This is one example. I've attached the logs again to this message, for your convenience.
OBR and Vertica are still hard down.
If this is not solved very soon, I will just re-image these VMs and start from scratch.
However, it's ridiculous that a known issue with Vertica is that it can and will become corrupted if you reboot a server.
Hi
Send the content of your opt/vertica/log and more specifc the admintools* files
Requested logs are attached. Thanks for any help you might provide.