Problems with multi-node install on Debian wheezy

The installer fails when I attempt to do a multinode install.

The problem seems to be somewhere in the bash adapter class.  When I run the installer, I get the following output:

=========================
root@debian:~# /opt/vertica/sbin/install_vertica -s 10.0.23.2,10.0.23.4 -r vertica_7.0.1-0_amd64.deb -u dbadmin -p newpassword -L CE --clean
Vertica Analytic Database 7.0.1-0 Installation Tool


>> Validating options...


Mapping hostnames in --hosts (-s) to addresses...

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

Error: Default shell on the following nodes are not bash. Default shell must be set to bash.
10.0.23.4 '
Exiting...
Installation FAILED with errors.

Installation stopped before any changes were made.
root@debian:~# 
=========================

When looking at the "BashAdapter" python class used by the vertica installer to connect to the other hosts, I see that "echo $SHELL" is used to determine the default shell.  Now, from the same environment under which I run the installer, when I run this same command, I get the following output.

root@debian:~# ssh 10.0.23.4 "echo $SHELL"
/bin/bash
root@debian:~#

Looking at the code, the following function, defined in /opt/vertica/oss/python/lib/python2.7/site-packages/vertica/install/__init__.py, is being used to check the default shell on the other nodes:

===========================
def check_default_shell(fullhostname_list, installerSSH):
    installerSSH.setHosts(fullhostname_list)
    Status, res = installerSSH.execute("echo $SHELL", hide=True)

    shell_check_result = res.items()
    
    
    wrong_shells = {}
    for k,v in res.items():
        if "bash" not in v[1][0]:
            wrong_shells=v[1][0]

    if len(wrong_shells)>0:
        print "Error: Default shell on the following nodes are not bash. Default shell must be set to bash."
        for k,v in wrong_shells.items():
            print k,v
        print "Exiting..."
        sys.exit(1)
    else:
        print "Default shell on nodes:"
        for k,v in res.items():
            print k,v[1][0]
============================

If I dump "res.items()" to screen, during the installer's default shell check, I get the following output:

=============================

root@debian:~# /opt/vertica/sbin/install_vertica -s 10.0.23.2,10.0.23.4 -r vertica_7.0.1-0_amd64.deb -u dbadmin -p newpassword -L CE --clean
Vertica Analytic Database 7.0.1-0 Installation Tool


>> Validating options...


Mapping hostnames in --hosts (-s) to addresses...

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

[('10.0.23.2', ['0', ['/bin/bash']]),
 ('10.0.23.4',
  ['/bin/bash\r\nVERTICA MAGIC PROMPT:0',
   ["'", 'echo $SHELL', 'root@debian:~# stty -echo']])]
Error: Default shell on the following nodes are not bash. Default shell must be set to bash.
10.0.23.4 '
Exiting...
Installation FAILED with errors.

Installation stopped before any changes were made.

===============================

In the meantime, I see the following in my /opt/vertica/log/install.log file:

===============================

20140813 184955:DEBUG:root:Logging started.  Logging to: ['/opt/vertica/log/install.log']
20140813 184955:INFO:root:------------------------------------------------------------
20140813 184955:INFO:root:Begin create_dba
20140813 184955:INFO:root:------------------------------------------------------------
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:DbaCreate initialized: <UserGroup user='dbadmin' group='verticadba' home='/home/dbadmin'>
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:step: Provided DB Admin account details:
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: user = dbadmin
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: group = verticadba
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: home = /home/dbadmin
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:step: Creating group...
20140813 184955:ERROR:vertica.system.usergroup.UserGroup:Error getting user information: 'getpwnam(): name not found: dbadmin'
20140813 184955:ERROR:vertica.system.usergroup.UserGroup:Error getting group information: 'getgrnam(): name not found: verticadba'
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: Adding group...
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:Creating group: 'verticadba'
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: # /usr/sbin/groupadd verticadba
20140813 184955:DEBUG:vertica.platform.node.dba.DbaCreate:Output of groupadd: ''
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:step: Validating group...
20140813 184955:ERROR:vertica.system.usergroup.UserGroup:Error getting user information: 'getpwnam(): name not found: dbadmin'
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: Okay
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:step: Creating user...
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: Adding user...
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:Creating user: 'dbadmin'
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: # /usr/sbin/useradd --create-home --gid verticadba --home /home/dbadmin --shell /bin/bash --password '*' dbadmin
20140813 184955:DEBUG:vertica.platform.node.dba.DbaCreate:Output of useradd: ''
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: Setting user credentials...
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: # /usr/sbin/chpasswd
20140813 184955:DEBUG:vertica.platform.node.dba.DbaCreate:Output of chpassword: ''
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:step: Validating user...
20140813 184955:DEBUG:vertica.platform.node.dba.DbaCreate:Hidden from user, running /opt/vertica/share/binlib/test/file-access -u dbadmin /home/dbadmin
20140813 184955:DEBUG:vertica.platform.node.dba.DbaCreate:Output of test/file-access: ''
20140813 184955:INFO:vertica.platform.node.dba.DbaCreate:progress: Okay

==================================

So, I know that /bin/bash is the default shell on my nodes, despite what the installer might think.  With that, I thought I'd comment out the call to check_default_shell() made in /opt/vertica/oss/python/lib/python2.7/site-packages/vertica/install/__init__.py and then re-run the install command:

==================================

root@debian:~# /opt/vertica/sbin/install_vertica -s 10.0.23.2,10.0.23.4 -r vertica_7.0.1-0_amd64.deb -u dbadmin -p newpassword -L CE --clean
Vertica Analytic Database 7.0.1-0 Installation Tool


>> Validating options...


Mapping hostnames in --hosts (-s) to addresses...

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

There are Vertica adminTool processes running on ['10.0.23.4']. They must be stopped before installation can continue

Installation FAILED with errors.

Installation stopped before any changes were made.

===================================

This time, I get an error saying I have "adminTool" processes running on the other node (10.0.23.4).  I run the following commands to see if /opt/vertica/bin/admintools or /opt/vertica/bin/adminTools are running.  This is the output I get (indicating that neither are running):

===================================

root@debian:~# /opt/vertica/sbin/install_vertica -s 10.0.23.2,10.0.23.4 -r vertica_7.0.1-0_amd64.deb -u dbadmin -p newpassword -L CE --clean
Vertica Analytic Database 7.0.1-0 Installation Tool


>> Validating options...


Mapping hostnames in --hosts (-s) to addresses...

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

There are Vertica adminTool processes running on ['10.0.23.4']. They must be stopped before installation can continue

Installation FAILED with errors.

Installation stopped before any changes were made.
root@debian:~# ssh 10.0.23.4 "ps aux | grep -i 'admintools'"
root      3797  0.0  0.0  10752  1368 ?        Ss   02:37   0:00 bash -c ps aux | grep -i 'admintools'
root      3799  0.0  0.0   7832   872 ?        S    02:37   0:00 grep -i admintools
root@debian:~# ssh 10.0.23.4 "ps aux | grep 'vertica'"
dbadmin   1361  0.0  0.0   9236  1312 ?        Ss   00:25   0:00 /bin/bash /opt/vertica/agent/agent.sh /opt/vertica/config/users/dbadmin/agent.conf
dbadmin   1369  0.2  1.0 262988 21036 ?        Sl   00:25   0:22 /opt/vertica/oss/python/bin/python ./simply_fast.py
root      3804  0.0  0.0  10752  1372 ?        Ss   02:38   0:00 bash -c ps aux | grep 'vertica'
root      3806  0.0  0.0   7832   852 ?        S    02:38   0:00 grep vertica
root@debian:~#

====================================

What I see are the the result of starting /etc/init.d/verticad and /etc/init.d/vertica_agent

But for good measure, I kill those 2 processes and re-run the installer.

====================================

root@debian:~# ssh 10.0.23.4 "/etc/init.d/verticad stop; /etc/init.d/vertica_agent stop"
Vertica: stop OK for users: dbadmin
.
Stopping vertica agent: 
root@debian:~# ssh 10.0.23.4 "ps aux | grep 'vertica'"
root      3852  0.0  0.0  10752  1368 ?        Ss   02:41   0:00 bash -c ps aux | grep 'vertica'
root      3854  0.0  0.0   7832   856 ?        S    02:41   0:00 grep vertica

=====================================

Now, I know that no vertica-related processes are running on 10.0.23.4.  However, the following is the output when I try to run the installer again (again, with the shell check commented out):

=====================================

root@debian:~# /opt/vertica/sbin/install_vertica -s 10.0.23.2,10.0.23.4 -r vertica_7.0.1-0_amd64.deb -u dbadmin -p newpassword -L CE --clean
Vertica Analytic Database 7.0.1-0 Installation Tool


>> Validating options...


Mapping hostnames in --hosts (-s) to addresses...

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

There are Vertica adminTool processes running on ['10.0.23.4']. They must be stopped before installation can continue

Installation FAILED with errors.

Installation stopped before any changes were made.

=======================================

At this point, I comment out the lines in --- that are checking for "Vertica adminTool processes" and re-run the installer.  This is what I get:

=======================================

root@debian:~# /opt/vertica/sbin/install_vertica -s 10.0.23.2,10.0.23.4 -r vertica_7.0.1-0_amd64.deb -u dbadmin -p newpassword -L CE --clean
Vertica Analytic Database 7.0.1-0 Installation Tool


>> Validating options...


Mapping hostnames in --hosts (-s) to addresses...

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

Detected invalid permissions on /opt/vertica directories on the following hosts: ['10.0.23.4']
Permissions must be set to 755 or higher for install_vertica to work correctly.
Installation FAILED with errors.

Installation stopped before any changes were made.
root@debian:~# chmod 755 /opt/vertica/sbin/install_vertica 
root@debian:~# ssh 10.0.23.4 "chmod 755 /opt/vertica/sbin/install_vertica"
root@debian:~# /opt/vertica/sbin/install_vertica -s 10.0.23.2,10.0.23.4 -r vertica_7.0.1-0_amd64.deb -u dbadmin -p newpassword -L CE --clean
Vertica Analytic Database 7.0.1-0 Installation Tool


>> Validating options...


Mapping hostnames in --hosts (-s) to addresses...

>> Starting installation tasks.
>> Getting system information for cluster (this may take a while)...

Detected invalid permissions on /opt/vertica directories on the following hosts: ['10.0.23.4']
Permissions must be set to 755 or higher for install_vertica to work correctly.
Installation FAILED with errors.

Installation stopped before any changes were made.
root@debian:~#

==================================

My educated guess at this point is that whenever a command is run through the BashAdapter python class (that is a part of Vertica's python code for running shell commands remotely), the output is slightly different than what the installer code is expecting.  As noted earlier, it seems to add some weird strings to the shell commands' outputs':

   \r\nVERTICA MAGIC PROMPT:0',
   ["'", 'echo $SHELL', 'root@debian:~# stty -echo']])]
   
 

Comments

  • Hello Ankur,

    We've seen the "Default shell on the following nodes are not bash. Default shell must be set to bash." problem before. To resolve, please try the following:
    When running the install script using sudo, all commands run on all remote nodes over SSH are run as sudo as well. Please follow these steps to resolve this issue:
    NODE: This needs to be done on all nodes 1. in /etc/sudoers, comment out the following line (with a #):
    Defaults    requiretty
    OR
    add the following line to exempt the dbadmin from tty:
    Defaults:dbadmin    !requiretty      2. This step may also be necessary. Configure sudo to not ask for a password. In /etc/sudoers, edit dbadmin's entry to look like this:
    dbadmin  ALL=(ALL)   NOPASSWD: ALL  
    Thanks,
    Rory

  • Hi Rory,

    Thanks for the pointers.

    Unfortunately, I'm stilling getting all the same issues, even after explicitly editing my /etc/sudoers file to not require tty for the dbadmin user (I'm running the installer as root, but already have prompt-less sudo set up for user dbadmin on both nodes).

  • Hi!
    My educated guess at this point is that whenever a command is run through the BashAdapter python
    Its not a Python, its a Bash scripts on Python, a developer don't know Python, so he/she did it on BASH actually. O_0

    Python can open files(also with random access), determine OS type, read files, create/delete dirs and so on without using in Bash, but all scripts actually are done on Bash and not with Python. That's why you are getting so many stupid error.
    Awful python developer/s in HP Vertica (for example everything will fail if you have fortune or cowsay installed on your machine?)


    Example 1
    if not (os.path.exists(userPath)):
    cmd = "mkdir -p " + userPath
    os.system(cmd)
    cmd = "chmod go-w "+ userPath
    os.system(cmd)
    Python can't create directory, can't change permissions.


    Example 2
    def admintools_help(option, opt, value, parser):
    if '-t' in sys.argv:
    try:
    tool = sys.argv[ sys.argv.index( '-t' ) + 1 ]
    clc = commandLineCtrl.commandLineCtrl(False,False)
    tools = clc.getCommandLineTools()

    Python has no optparse and ArgParser modules.

    Example 3
                self.logFile.close()
    destFile = ("%s.zip" % self.outputDir)
    zipCmd = "zip -r"

    Python has no ZipFile module.

    Example 4
                os.system("/bin/rm -f " + destFile)
    os.system("sh -c \"cd %s; cd ..; %s %s %s > /dev/null 2> /dev/null\"" % (self.outputDir, zipCmd, destFile, "VerticaDiagnostics." + self.date_nonce))

    Python can not delete files, you have to do it via BASH - after it you need to parse a exit status of Bash to understand if it works or not. Nice!

    Example 5
        def AddLocalStateDumpCommands(self, cmds, dataFileName):
    self.appendCommand(cmds, "ulimit -a", dataFileName)
    self.appendCommand(cmds, "uname -a", dataFileName)
    self.appendCommand(cmds, "rpm -q %s" % DBname.package, dataFileName)
    self.appendCommand(cmds, "if [ -f '/etc/redhat-release' ]; then /bin/cat /etc/redhat-release; fi", dataFileName)
    self.appendCommand(cmds, "if [ -f '/etc/SuSE-release' ]; then /bin/cat /etc/SuSE-release; fi", dataFileName)
    Python can't determine a OS type, Python can not read files you have to do it via BASH


    Why  Python if all done with BASH actually?
            self.appendCommand(cmds, 'for dev in 'ls /sys/block';do  echo "[$dev]";cat /sys/block/$dev/queue/scheduler; done', dataFileName, "I/O Scheduler Settings" )

    python platform
    python ZipFile
    python shell utils
    python file access

    Good Luck.
  • UPDATE

    as example take a look on next fail: https://community.vertica.com/vertica/topics/installer_fails_with_this_error_vertica_local_coerce_re...

    And why? Because BASH returns string with escape chars (new line, colours, clear screen, etc)
    LANG = 'en_US.UTF-8\n\x1b[H\x1b[2J'
    So its not a python, its an awful BASH developers that trying to do it with Python. Really, I can't understand it, because you need to parse or understand a response from BASH. Why I need this unnecessary layer? I can't understand - or do it with BASH or do it with Python.
  • How to resolve this issue?
  • Hi Siddarth!


    A problem above can be resolved by injection of small piece of code in source(or with monkey-patch), that removes escape sequences, its not a solution and should not be used in production code (especially for expensive Enterprise Application).

    As example
    How can I remove the ANSI escape sequences from a string in python
     http://stackoverflow.com/questions/14693701/how-can-i-remove-the-ansi-escape-sequences-from-a-string...
  • Thanks Daniel!!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file