Spread config files missing in vanilla install
I'm trying to get a basic single-node installation of Vertica working on Oracle Enterprise Linux 6.5. I can install without issue, but when I try to create a new database, it fails to start up my node1, eventually failing:
Starting Vertica on all nodes. Please wait, databases with large catalogs may take a while to initialize.
Starting Vertica on all nodes. Please wait, databases with large catalogs may take a while to initialize.
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
Node Status: v_wl4_node0001: (DOWN)
ERROR: Database did not start cleanly on initiator node!
Stopping all nodes
---
When I begin to debug, it looks like spread is the likely suspect. But in Vertica 7, it looks like I'm missing both /etc/init.d/spreadd and /opt/vertica/conf/spread.conf and vspread.conf. Should I manually create these files?
I previously had a working install on this machine with no issues.
0
Comments
The spread configuration is part of the database catalog and you can find the configuration file under the catalog directory. But you can't change the file, if you do when restarting the database Vertica will print a new one.
The first thing to debug is to check that the firewall is off.
Eugenia
We have integrated spread into the Vertica installation. The spread executable is in the /opt/vertica/bin directory and the spread configuration file, now named spread.conf, is located in the DB catalog directory. Spread is also spawned by the Vertica process upon DB startup instead of always being running.
You will want to take a look at the vertica.log file, along with the dbLog file in the DB directory. The /opt/vertica/log/admintools-dbadmin.log file will also give you some info on this error.
Did you completely remove the old Vertica installation before installing this instance or is this an upgrade install?
- Mitch
rpm -e vertica-7.0.1-0.x86_64;
userdel dbadmin;
groupdel verticadba;
rm -rf /home/dbadmin;
rm -rf /opt/vertica;
Here's how I installed:
rpm -Uvh vertica-7.0.1-0.x86_64.RHEL5.rpm
/opt/vertica/sbin/install_vertica -s wavelength-dw -L /tmp/license.txt -Y -p '<password>' -P '<password>' --failure-threshold NONE
/opt/vertica/bin/adminTools --tool create_db -s wavelength-dw --database wavelength -p '<password>'
The first and second commands completed successfully, the third errors out.
The hostname, 'wavelength-dw' resolves to the external eth0 IP. I also tried with localhost and by directly setting the IP with no luck. The hostname is in /etc/hosts. I also tried setting the control-network.
While the database is creating, can you check if the catalog and data directory are created. Check under the database directory for the dbLog file and see if there any error printer there.
You can also find details of the error in /opt/vertica/logs/admintools-dbadmin.log file.
Check in those places see if you find more details.
Eugenia
Conf_load_conf_file: using file: /home/dbadmin/wavelength/v_wavelength_node0001_catalog/spread.conf
Successfully configured Segment 0 [17.207.162.255:4803] with 1 procs:
N017207162243: 17.207.162.243
03/20/14 15:48:54 SP_connect: unable to connect mailbox 9: Connection refused
03/20/14 15:48:55 SP_connect: unable to connect mailbox 9: Connection refused
[...snip...]
03/20/14 15:49:20 SP_connect: unable to connect mailbox 9: Connection refused
03/20/14 15:49:20 SP_connect: unable to VSpread could not connect on local domain socket 4803: -2
Unable to open indirect spread information: /opt/vertica/config/local-spread.conf
$more /opt/vertica/config/admintools.conf
[Configuration]
last_port = 5433
default_base = /home/dbadmin
format = 3
install_opts = -s '17.207.162.243' -L '/tmp/license.txt' -Y -p '*******' -P '*******' --failure-threshold NONE
spreadlog = False
controlsubnet = default
controlmode = broadcast
[Cluster]
hosts = 17.207.162.243
[Nodes]
node0001 = 17.207.162.243,/home/dbadmin,/home/dbadmin
You can do service iptables status
or you can verify doing a nststat test, for example
# sudo netstat -uatp | grep 480
Also try to create the database again and before the database get removed grab the spread.conf from the catalog folder we can try to see if we can manually start spread after the database creating fails.
To manually test spread
/opt/vertica/spread/sbin/spread -c spread.conf_that_you_grabbed_before_it_was_removed.
Until spread not successfully start you won't be able to create the database.
[...header removed...]
Conf_load_conf_file: using file: /home/dbadmin/wavelength/v_wavelength_node0001_catalog/spread.conf
Successfully configured Segment 0 [17.207.162.255:4803] with 1 procs:
N017207162243: 17.207.162.243
----
Command exits and 'netstat -uatp | grep 480' shows nothing, both when trying to create the database and after I manually run spread. Here is the spread.conf it's using:
# 1
# Auto-generated by vertica - do not edit
Spread_Segment 17.207.162.255:4803 {
N017207162243 17.207.162.243 {
17.207.162.243
}
}
# begin end matter
EventLogFile = /dev/null
EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"
DebugFlags = { PRINT EXIT }
ExitOnIdle = yes
https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/InstallationGuide/BeforeYouInstall/Ensure...
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1670/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 1545/cupsd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1746/master
tcp 0 0 0.0.0.0:5444 0.0.0.0:* LISTEN 35655/python
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1472/rpcbind
tcp 0 0 0.0.0.0:58484 0.0.0.0:* LISTEN 1490/rpc.statd
tcp 0 0 :::22 :::* LISTEN 1670/sshd
tcp 0 0 ::1:631 :::* LISTEN 1545/cupsd
tcp 0 0 ::1:25 :::* LISTEN 1746/master
tcp 0 0 :::56798 :::* LISTEN 1490/rpc.statd
tcp 0 0 :::111 :::* LISTEN 1472/rpcbind
--
0.0.0.0:5444 is simply_fast.py
udp 0 0 0.0.0.0:68 0.0.0.0:* 1360/dhclient
udp 0 0 0.0.0.0:111 0.0.0.0:* 1472/rpcbind
udp 0 0 17.207.162.243:123 0.0.0.0:* 3734/ntpd
udp 0 0 127.0.0.1:123 0.0.0.0:* 3734/ntpd
udp 0 0 0.0.0.0:123 0.0.0.0:* 3734/ntpd
udp 0 0 0.0.0.0:47652 0.0.0.0:* 1490/rpc.statd
udp 0 0 0.0.0.0:631 0.0.0.0:* 1545/cupsd
udp 0 0 0.0.0.0:799 0.0.0.0:* 1472/rpcbind
udp 0 0 0.0.0.0:818 0.0.0.0:* 1490/rpc.statd
udp 0 0 :::59309 :::* 1490/rpc.statd
udp 0 0 :::111 :::* 1472/rpcbind
udp 0 0 fe80::20c:29ff:fefe:be27:123 :::* 3734/ntpd
udp 0 0 ::1:123 :::* 3734/ntpd
udp 0 0 :::123 :::* 3734/ntpd
udp 0 0 :::799 :::* 1472/rpcbind
# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
/opt/vertica/spread/sbin/spread -c <<THIS SAVED FILE>>
Then check in /tmp/spread.log what is logged.
unlink("/tmp/4803") = -1 EPERM (Operation not permitted)
I did an 'ls' on that file and this was the output:
srw-rw-rw-. 1 501 501 0 Mar 18 01:59 4803
It looks like that file previously belonged to dbadmin:verticadba, and when I re-installed I removed that user and group prior to that file being removed. When reinstalled, dbadmin didn't have privileges to remove it and quietly exited when the unlink() failed.