Spread config files missing in vanilla install

I'm trying to get a basic single-node installation of Vertica working on Oracle Enterprise Linux 6.5. I can install without issue, but when I try to create a new database, it fails to start up my node1, eventually failing:

Starting Vertica on all nodes. Please wait, databases with large catalogs may take a while to initialize.


Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

Node Status: v_wl4_node0001: (DOWN) 

ERROR:  Database did not start cleanly on initiator node!

        Stopping all nodes

---

When I begin to debug, it looks like spread is the likely suspect. But in Vertica 7, it looks like I'm missing both /etc/init.d/spreadd and /opt/vertica/conf/spread.conf and vspread.conf. Should I manually create these files?

I previously had a working install on this machine with no issues.

Comments

  • In Vertica Analytic Database 7.0, spread is managed at database start/stop time, rather than during theVertica Analytic Database installation process. This means that the spread process does not run when the database is not running.
    The spread configuration is part of the database catalog and you can find the configuration file under the catalog directory. But  you can't change the file, if you do when restarting the database Vertica will print a new one. 
    The first thing to debug is to check that the firewall is off.
    Eugenia 
  • Hi Steven,

    We have integrated spread into the Vertica installation. The spread executable is in the /opt/vertica/bin directory and the spread configuration file, now named spread.conf, is located in the DB catalog directory. Spread is also spawned by the Vertica process upon DB startup instead of always being running.

    You will want to take a look at the vertica.log file, along with the dbLog file in the DB directory. The /opt/vertica/log/admintools-dbadmin.log file will also give you some info on this error.

    Did you completely remove the old Vertica installation before installing this instance or is this an upgrade install?

    - Mitch
  • Here's how I uninstalled:

    rpm -e vertica-7.0.1-0.x86_64;
    userdel dbadmin;
    groupdel verticadba;
    rm -rf /home/dbadmin;
    rm -rf /opt/vertica;

    Here's how I installed:

    rpm -Uvh vertica-7.0.1-0.x86_64.RHEL5.rpm

    /opt/vertica/sbin/install_vertica -s wavelength-dw -L /tmp/license.txt -Y -p '<password>' -P '<password>' --failure-threshold NONE

    /opt/vertica/bin/adminTools --tool create_db -s wavelength-dw --database wavelength -p '<password>'

    The first and second commands completed successfully, the third errors out.

    The hostname, 'wavelength-dw' resolves to the external eth0 IP. I also tried with localhost and by directly setting the IP with no luck. The hostname is in /etc/hosts. I also tried setting the control-network.

  • I also shut down the firewall prior to installing using: service iptables stop
  • What error did give you? Have you try creating the database using Admintools UI not the command line? 
    While the database is creating, can you check if the catalog and data directory are created. Check under the database directory for the dbLog file and see if there any error printer there. 

    You can also find details of the error in /opt/vertica/logs/admintools-dbadmin.log file. 

    Check in those places see if you find more details.

    Eugenia
  • My /home/dbadmin/wavelength/dbLog:

    Conf_load_conf_file: using file: /home/dbadmin/wavelength/v_wavelength_node0001_catalog/spread.conf

    Successfully configured Segment 0 [17.207.162.255:4803] with 1 procs:

          N017207162243: 17.207.162.243

    03/20/14 15:48:54 SP_connect: unable to connect mailbox 9: Connection refused

    03/20/14 15:48:55 SP_connect: unable to connect mailbox 9: Connection refused

    [...snip...]

    03/20/14 15:49:20 SP_connect: unable to connect mailbox 9: Connection refused

    03/20/14 15:49:20 SP_connect: unable to VSpread could not connect on local domain socket 4803: -2

    Unable to open indirect spread information: /opt/vertica/config/local-spread.conf

  • I tried through the GUI with similar results. My adminTools-dbadmin.log: http://pastebin.com/uB8NQy3N
  • (I should also mention that /opt/vertica/config/local-spread.conf doesn't exist in my install)
  • can you post the output of

    $more /opt/vertica/config/admintools.conf

  • admintools.conf:

    [Configuration]

    last_port = 5433

    default_base = /home/dbadmin

    format = 3

    install_opts = -s '17.207.162.243' -L '/tmp/license.txt' -Y -p '*******' -P '*******' --failure-threshold NONE

    spreadlog = False

    controlsubnet = default

    controlmode = broadcast


    [Cluster]

    hosts = 17.207.162.243


    [Nodes]

    node0001 = 17.207.162.243,/home/dbadmin,/home/dbadmin



  • For some reason your spread is not restarting..Verify that the nodes ports are correctly open. Spread need port 4803 and 4804

    You can do service iptables status
    or you can verify doing a nststat test, for example

    # sudo netstat -uatp | grep 480


    Also try to create the database again and before the database get removed grab the spread.conf from the catalog folder we can try to see if we can manually start spread after the database creating fails. 

    To manually test spread 
    /opt/vertica/spread/sbin/spread -c spread.conf_that_you_grabbed_before_it_was_removed. 

    Until spread not successfully start you won't be able to create the database. 




  • [dbadmin@wavelength-dw ~]$ /opt/vertica/spread/sbin/spread -c /home/dbadmin/wavelength/v_wavelength_node0001_catalog/spread.conf 
    [...header removed...]

    Conf_load_conf_file: using file: /home/dbadmin/wavelength/v_wavelength_node0001_catalog/spread.conf

    Successfully configured Segment 0 [17.207.162.255:4803] with 1 procs:

          N017207162243: 17.207.162.243


    ----

    Command exits and 'netstat -uatp | grep 480' shows nothing, both when trying to create the database and after I manually run spread. Here is the spread.conf it's using:

    # 1

    # Auto-generated by vertica - do not edit

    Spread_Segment 17.207.162.255:4803 {

      N017207162243    17.207.162.243 {

        17.207.162.243

      }

    }

    # begin end matter

    EventLogFile = /dev/null

    EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"

    DebugFlags = { PRINT EXIT }

    ExitOnIdle = yes

  • Seems that your ports are not available. This documentation link gives you details : 
    https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/InstallationGuide/BeforeYouInstall/Ensure...

  • [root@wavelength-dw]# netstat -atupn | grep LISTEN

    tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      1670/sshd           

    tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN      1545/cupsd          

    tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      1746/master         

    tcp        0      0 0.0.0.0:5444                0.0.0.0:*                   LISTEN      35655/python        

    tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN      1472/rpcbind        

    tcp        0      0 0.0.0.0:58484               0.0.0.0:*                   LISTEN      1490/rpc.statd      

    tcp        0      0 :::22                       :::*                        LISTEN      1670/sshd           

    tcp        0      0 ::1:631                     :::*                        LISTEN      1545/cupsd          

    tcp        0      0 ::1:25                      :::*                        LISTEN      1746/master         

    tcp        0      0 :::56798                    :::*                        LISTEN      1490/rpc.statd      

    tcp        0      0 :::111                      :::*                        LISTEN      1472/rpcbind

    --

    0.0.0.0:5444 is simply_fast.py

  • UDP:

    udp        0      0 0.0.0.0:68                  0.0.0.0:*                               1360/dhclient       

    udp        0      0 0.0.0.0:111                 0.0.0.0:*                               1472/rpcbind        

    udp        0      0 17.207.162.243:123          0.0.0.0:*                               3734/ntpd           

    udp        0      0 127.0.0.1:123               0.0.0.0:*                               3734/ntpd           

    udp        0      0 0.0.0.0:123                 0.0.0.0:*                               3734/ntpd           

    udp        0      0 0.0.0.0:47652               0.0.0.0:*                               1490/rpc.statd      

    udp        0      0 0.0.0.0:631                 0.0.0.0:*                               1545/cupsd          

    udp        0      0 0.0.0.0:799                 0.0.0.0:*                               1472/rpcbind        

    udp        0      0 0.0.0.0:818                 0.0.0.0:*                               1490/rpc.statd      

    udp        0      0 :::59309                    :::*                                    1490/rpc.statd      

    udp        0      0 :::111                      :::*                                    1472/rpcbind        

    udp        0      0 fe80::20c:29ff:fefe:be27:123 :::*                                    3734/ntpd           

    udp        0      0 ::1:123                     :::*                                    3734/ntpd           

    udp        0      0 :::123                      :::*                                    3734/ntpd           

    udp        0      0 :::799                      :::*                                    1472/rpcbind  

  • What about the firewall? This is how it should look, 
    # iptables -L

    Chain INPUT (policy ACCEPT)

    target     prot opt source               destination         


    Chain FORWARD (policy ACCEPT)

    target     prot opt source               destination         


    Chain OUTPUT (policy ACCEPT)

    target     prot opt source               destination  


  • [root@wavelength-dw]# iptables -L

    Chain INPUT (policy ACCEPT)

    target     prot opt source               destination         


    Chain FORWARD (policy ACCEPT)

    target     prot opt source               destination         


    Chain OUTPUT (policy ACCEPT)

    target     prot opt source               destination 

  • Strace output from the manual spread execution: http://pastebin.com/bYb96tAa
  • Add debug to spread and try to manually run it. You spread.conf for testing purpose should look : 
    # 1
    # Auto-generated by vertica - do not edit
    Spread_Segment 17.207.162.255:4803 {
      N017207162243    17.207.162.243 {
        17.207.162.243
      }
    }
    # begin end matter
    EventLogFile = /tmp/spread.log
    EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"
    DebugFlags = { PRINT ALL EXIT }
    ExitOnIdle = yes
    Save the file and run spread as :

    /opt/vertica/spread/sbin/spread -c <<THIS SAVED FILE>>

    Then check in /tmp/spread.log what is logged. 


  • Working now. From the spread strace output, this stood out: 

    unlink("/tmp/4803")                     = -1 EPERM (Operation not permitted)

    I did an 'ls' on that file and this was the output:

    srw-rw-rw-. 1  501  501    0 Mar 18 01:59 4803

    It looks like that file previously belonged to dbadmin:verticadba, and when I re-installed I removed that user and group prior to that file being removed. When reinstalled, dbadmin didn't have privileges to remove it and quietly exited when the unlink() failed.

  • Glad that it works ! Enjoy Vertica now! :)
  • Thanks, Eugenia!
  • My pleasure! Thanks to you now we have for the Vertica community how to debug spread in all different ways :) 

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file