spread exit caused by alarm(exit)
Unable to restart Vertica -
At first instance when checked in log files vertica.log and adminTools-dbadmin.log file we did not find anything that might be causing the issue.But with a through look we found the below warning..
Spread Mailbox Dequeue:0x6947c00 [Comms] <INFO> Spread dequeue thread exiting
Spread Client:0x6558a40 [Comms] <WARNING> error SP_receive: Illegal spread was provided
Spread Client:0x6558a40 [Comms] <INFO> spread thread exiting
when checking in /opt/vertica/spread/sbin/spread -c we are getting spread exit caused by alarm(exit) ..Any idea whats the problem here ?
At first instance when checked in log files vertica.log and adminTools-dbadmin.log file we did not find anything that might be causing the issue.But with a through look we found the below warning..
Spread Mailbox Dequeue:0x6947c00 [Comms] <INFO> Spread dequeue thread exiting
Spread Client:0x6558a40 [Comms] <WARNING> error SP_receive: Illegal spread was provided
Spread Client:0x6558a40 [Comms] <INFO> spread thread exiting
when checking in /opt/vertica/spread/sbin/spread -c we are getting spread exit caused by alarm(exit) ..Any idea whats the problem here ?
0
Comments
Please share Vertica version.
Check your spread.conf file.
This link might help you.
http://stackoverflow.com/questions/1922102/spread-exits-directly-after-starting
Hope this helps.
NC
Thanks for your response.
We have a 3 node clutser. DB is down on all the nodes.
When we try to make it UP from admintools, it is taking too long showing node 1 as initializing and node 2 and node 3 as down. It seems not able to initialize node 2 and node 3.
Please suggest.
Below are the info for vertica conf.
We are using the vertica-7.0.1-0.x86_64 vertica version.
our spread configuration file looks like below:
# 7# Auto-generated by vertica - do not edit
Spread_Segment 16.181.239.255:4803 {
N016181233067 16.181.233.67 {
16.181.233.67
}
N016181233068 16.181.233.68 {
16.181.233.68
}
N016181233069 16.181.233.69 {
16.181.233.69
}
}
# begin end matter
EventLogFile = /dev/null
EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"
DebugFlags = { PRINT EXIT }
ExitOnIdle = yes
Seems your node 1 is failing to start.
Check if your spread daemon is working fine on Node 1. You can check like this: If you don't see a spread process running, you need to manually start the spread services like this: This should start the spread services your node 1.
Then you can try again restarting the database.
Hope this helps.
NC
You may run this command and see if there is any issue in all the nodes.
strace /opt/vertica/spread/sbin/spread -c /home/dbadmin/<dbname>/v_<dbname>_node0001_catalog/spread.conf
check if /tmp folder has access to dbadmin and also there should be a file 4803 if that is present
we can see the process for spread running .
But when we go to the folder /etc/init.d/ we do not see any spread file .as a result we are not able to execute the spread start or spread status command . Just wanted to know is the spread file in the folder created at run time?
This is the output on Node 1 and 3:
Conf_load_conf_file: error opening config file /home/dbadmin/analytics/v_analytics_node0002_catalog/spread.conf
Exit caused by Alarm(EXIT)
[dbadmin@TBDA-AE3 ~]$ cd /tmp/
[dbadmin@TBDA-AE3 tmp]$ ll
total 0
srw-rw-rw-. 1 dbadmin verticadba 0 Sep 24 11:35 4803
This is yhe output in Node 2
11:38 AM
\===========================================================================/
Conf_load_conf_file: using file: /home/dbadmin/analytics/v_analytics_node0002_catalog/spread.conf
Successfully configured Segment 0 [16.181.239.255:4803] with 3 procs:
N016181233067: 16.181.233.67
N016181233068: 16.181.233.68
N016181233069: 16.181.233.69
11:38 AM
But we could not find the spread.log file in Node 1 and 3 May be because of exit alarm
Below is the spread log in Node 2:
[root@TBDS-AE2 tmp]# cat spread.log
[Tue 23 Sep 2014 12:30:20] Set Alarm mask to: ffffffff
[Tue 23 Sep 2014 12:30:20] Finished configuration file.
[Tue 23 Sep 2014 12:30:20] Hash value for this configuration is: 2200920881
[Tue 23 Sep 2014 12:30:20] Conf_load_conf_file: My name: N016181233068, id: 16.181.233.68, port: 4803
[Tue 23 Sep 2014 12:30:20] E_init: went ok
[Tue 23 Sep 2014 12:30:20] new: creating pointer 0x1a35e70 to object type 2 named pack_head
[Tue 23 Sep 2014 12:30:20] new: creating pointer 0x1a360e0 to object type 50 named packet_body
[Tue 23 Sep 2014 12:30:20] new: creating pointer 0x1a35eb0 to object type 8 named token_head
[Tue 23 Sep 2014 12:30:20] new: creating pointer 0x1a366a0 to object type 9 named token_body
[Tue 23 Sep 2014 12:30:20] new: creating pointer 0x1a35ef0 to object type 8 named token_head
[Tue 23 Sep 2014 12:30:20] new: creating pointer 0x1a36c60 to object type 50 named packet_body
[Tue 23 Sep 2014 12:30:20] Net_init: Bcast needed to address (280358911, 4803)
[Tue 23 Sep 2014 12:30:20] DL_init_channel: bind error (98): Address already in use for port 4803, with sockaddr (16.181.239.255: 4803) probably already running
Exit caused by Alarm(EXIT)
[root@TBDS-AE2 tmp]#
Please let us know what we are missing ??
Check the directory for node1
/home/dbadmin/analytics/v_analytics_node0001_catalog/spread.conf
and for node 3
/home/dbadmin/analytics/v_analytics_node0003_catalog/spread.conf
It looks like spread is OK for node 2