Cannot start database after upgrade vertica from 9.0.1-7 to 9.1.1

We've tried to upgrade from 9.0.1-7 to 9.1.1. The upgrade task was very smooth. But we can't start DB after the upgrade. We got the error below in /opt/vertica/log/adminTools.log same error on all the nodes we have. And it doesn't have any things print to startup.log and vertica.log. BTW During the upgrade, we've opened a case to Vertica Support. They've tried to do some troubleshooting. Finally we decided to rollback to 9.0.1-7. And the database can start UP as usual. Any suggestions are welcome?

Regards,
BoMBaY

Error in /opt/vertica/log/adminTools.log about "Vertica Catalog Editor: broken pipe detected"

2018-08-16 18:47:54.827 at_exec/45050:0x7f0cbc6a9740 [root.setup_custom_logging] <INFO> New log for 'at_exec'
2018-08-16 18:47:54.827 at_exec/45050:0x7f0cbc6a9740 [root.setup_custom_logging] <INFO> sys.argv: '/opt/vertica/share/eggs/vertica/engine/api/at_runner.py' --no-prompt --module 'vertica.engine.api.compute_vdatabase'
2018-08-16 18:47:54.828 at_exec/45050:0x7f0cbc6a9740 [ATRunner._parse_command] <INFO> Reading a line from stdin...
2018-08-16 18:47:54.828 at_exec/45050:0x7f0cbc6a9740 [ATRunner._parse_command] <INFO> Reading complete: [#{u'__rollback__': '********', u'__protocol__': '********', u'catalogpath': u'/xxxxx/xxxxx/xxxxxx_nodexxxx_catalog'}]
2018-08-16 18:47:54.829 at_exec/45050:0x7f0cbc6a9740 [compute_vdatabase.__init__] <INFO> Initialized new instance of compute_vdatabase
2018-08-16 18:47:54.829 at_exec/45050:0x7f0cbc6a9740 [ATRunner.exec_module] <INFO> ATRunner exec_module: command: <ATCommand> module=None version=1.0 args={u'catalogpath': u'/xxxxx/xxxxx/xxxxxx_nodexxxx_catalog'}
2018-08-16 18:47:54.829 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor.__init__] <INFO> Running catalog editor command: ['/opt/vertica/bin/vertica', '-D', u'/xxxxx/xxxxx/xxxxxx_nodexxxx_catalog', '-E', '-z']
2018-08-16 18:47:54.831 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor._parse_header] <INFO> Parsing CE header
2018-08-16 18:47:54.831 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor._recorded_readline] <INFO> About to call readline on catalog editor output
2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor._recorded_readline] <INFO> Next line of response was ['']

2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor._censor_then_log] <INFO> Sending to catalog editor: ['get singleton Database name\n']
2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor.sendCmd] <ERROR> Exception encountered while running catalog editor
Traceback (most recent call last):
File "/opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/CatalogEditor.py", line 120, in sendCmd
self.ceproc.stdin.write(cmd_string)
IOError: [Errno 32] Broken pipe
2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor.sendCmd] <ERROR>
Vertica Catalog Editor: broken pipe detected
Added stdout messages to error.
BEGIN CE output
END CE output
2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [ATRunner.exec_module] <ERROR> command got exception: Could not load from Catalog Editor.
Catalog Editor state
CatalogEditor instance
Closed? False
CE subprocess = <subprocess.Popen object at 0x7f0cb476ef50>
CE proc.poll = -11 CE proc.stdin = <open file '<fdopen>', mode 'wb' at 0x7f0cb4801c00> CE proc.stdout = <open file '<fdopen>', mode 'rb' at 0x7f0cb4801b70> Exception was Broken pipe
Vertica Catalog Editor: broken pipe detected
Added stdout messages to error.
BEGIN CE output
END CE output

Comments

  • Webex with Vertica Support summary:

    Upon starting DB the admintools just prints
    Unable to read database catalogs - cannot start database.

    Database did not start successfully

    Checked catalog directory, the directory owned by dbadmin; all files under it look good.
    permissions in catalog were
    drwxr-x--- 4 dbadmin verticadba 4096 Aug 16 10:46 Checkpoints
    drwxr-x--- 2 dbadmin verticadba 4096 Aug 16 10:46 Txnlogs

    I saw that in my cluster they are
    drwxrwx--- 4 dbadmin verticadba 4096 Aug 16 10:46 Checkpoints
    drwxrwx--- 2 dbadmin verticadba 4096 Aug 16 10:46 Txnlogs

    Asked them to change them to the same as mine and still the same error.

    ulimit -f says unlimited.

    limits.conf has:
    dbadmin - nice 0
    dbadmin - nofile 258200
    dbadmin - as unlimited
    dbadmin - fsize unlimited
    dbadmin - nproc 258202

    df -h shows plenty of space.

    Customer found that there is difference in the code for File "/opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/CatalogEditor.py", line 120, in sendCmd
    between the 9.1 and 9.0

    here is how it looks in tools:
    [dbadmin@xxxx-xxxx-xxx1 ~]$ cd /opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/
    [dbadmin@xxxx-xxxx-xxx1 tools]$ ls -lrt
    total 184
    -rw-rw-r-- 1 root root 1902 Jul 22 23:58 vioperf.py
    -rw-rw-r-- 1 root root 6270 Jul 22 23:58 vertica_key_mgmt.py
    -rw-rw-r-- 1 root root 2876 Jul 22 23:58 vcpuperf.py
    -rw-rw-r-- 1 root root 53 Jul 22 23:58 README
    -rw-rw-r-- 1 root root 5679 Jul 22 23:58 LogRotate.py
    -rw-rw-r-- 1 root root 3069 Jul 22 23:58 license_tool.py
    -rw-rw-r-- 1 root root 0 Jul 22 23:58 init.py
    -rw-rw-r-- 1 root root 9818 Jul 22 23:58 eula_checker.py
    -rw-rw-r-- 1 root root 29753 Jul 22 23:58 Diagnostics.py
    -rwxrwxr-x 1 root root 8643 Jul 22 23:58 DBfunctions.py
    -rw-rw-r-- 1 root root 10162 Jul 22 23:58 CatalogEditor.py
    -rwxrwxr-x 1 root root 22122 Jul 22 23:58 ATMain.py
    drwxrwxr-x 4 root root 4096 Aug 16 15:50 Scrutinize
    -rw-r--r-- 1 root root 164 Aug 16 15:52 init.pyc
    -rw-r--r-- 1 root root 11163 Aug 16 15:52 CatalogEditor.pyc

  • Webex Vertica Support Summary:

    Upon starting DB the admintools just prints
    Unable to read database catalogs - cannot start database.

    Database did not start successfully

    Checked catalog directory, the directory owned by dbadmin; all files under it look good.
    permissions in catalog were
    drwxr-x--- 4 dbadmin verticadba 4096 Aug 16 10:46 Checkpoints
    drwxr-x--- 2 dbadmin verticadba 4096 Aug 16 10:46 Txnlogs

    I saw that in my cluster they are
    drwxrwx--- 4 dbadmin verticadba 4096 Aug 16 10:46 Checkpoints
    drwxrwx--- 2 dbadmin verticadba 4096 Aug 16 10:46 Txnlogs

    Asked them to change them to the same as mine and still the same error.

    ulimit -f says unlimited.

    limits.conf has:
    dbadmin - nice 0
    dbadmin - nofile 258200
    dbadmin - as unlimited
    dbadmin - fsize unlimited
    dbadmin - nproc 258202

    df -h shows plenty of space.

    Customer found that there is difference in the code for File "/opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/CatalogEditor.py", line 120, in sendCmd
    between the 9.1 and 9.0

    here is how it looks in tools:
    [dbadmin@xxx-xxx-xxx1 ~]$ cd /opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/
    [dbadmin@xxx-xxx-xxx1 tools]$ ls -lrt
    total 184
    -rw-rw-r-- 1 root root 1902 Jul 22 23:58 vioperf.py
    -rw-rw-r-- 1 root root 6270 Jul 22 23:58 vertica_key_mgmt.py
    -rw-rw-r-- 1 root root 2876 Jul 22 23:58 vcpuperf.py
    -rw-rw-r-- 1 root root 53 Jul 22 23:58 README
    -rw-rw-r-- 1 root root 5679 Jul 22 23:58 LogRotate.py
    -rw-rw-r-- 1 root root 3069 Jul 22 23:58 license_tool.py
    -rw-rw-r-- 1 root root 0 Jul 22 23:58 init.py
    -rw-rw-r-- 1 root root 9818 Jul 22 23:58 eula_checker.py
    -rw-rw-r-- 1 root root 29753 Jul 22 23:58 Diagnostics.py
    -rwxrwxr-x 1 root root 8643 Jul 22 23:58 DBfunctions.py
    -rw-rw-r-- 1 root root 10162 Jul 22 23:58 CatalogEditor.py
    -rwxrwxr-x 1 root root 22122 Jul 22 23:58 ATMain.py
    drwxrwxr-x 4 root root 4096 Aug 16 15:50 Scrutinize
    -rw-r--r-- 1 root root 164 Aug 16 15:52 init.pyc
    -rw-r--r-- 1 root root 11163 Aug 16 15:52 CatalogEditor.pyc

  • Can you post editor.log from the _catalog directory?

  • @Ben_Vandiver There is not things write to editor.log.

  • Today, we tried to upgrade from 9.0.1-7 to 9.1.0-5 and got the same issue. But we upgraded from 9.0.1-7 to 9.0.1-13 successfully. And then, we tried to upgrade from 9.0.1-13 to 9.1.0-5 and got the same issue. Now we rolled back to 9.0.1-13.

  • @Ben_Vandiver, If you can access Service Request, This Service Request ID is SD02266679.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited August 2018

    @Itipong_Chewinp - Did you check that all projection buddies in the current database comply with the new requirements of 9.1?

    See:
    https://my.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/NewFeatures/9.1/9.1.0/UpgradeandInstall.htm

    Did you run the pre-upgrade script?

  • @Jim_Knicely, Yes, we've run the pre-upgrade script. The pre-upgrade script output is below.


    Congratulations! No unsafe projections detected. Upgrade to 9.1 should succeed


  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited August 2018

    @Itipong_Chewinp - Hmm. After the upgrade you can try and start the DB manually (i.e. without admintools). I attached a zip file that includes a script that should start the nodes in your cluster manually. Do you have a test environment where you can test it?

    It is a bash shell script and you run it like the following example where "test_db" is the name of the DB I am trying to start:

    [dbadmin@vertica01 ~]$ ./manual_start_up.sh test_db
    Starting Vertica processes...
    nohup ssh 192.168.2.200 /opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0001_catalog -C test_db -n v_test_db_node0001 192.168.2.200 -p 5433 -P 4803 -Y ipv4 &
    nohup: appending output to ‘nohup.out’
    nohup ssh 192.168.2.201 /opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0002_catalog -C test_db -n v_test_db_node0002 192.168.2.201 -p 5433 -P 4803 -Y ipv4 &
    nohup: appending output to ‘nohup.out’
    nohup ssh 192.168.2.202 /opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0003_catalog -C test_db -n v_test_db_node0003 192.168.2.202 -p 5433 -P 4803 -Y ipv4 &
    nohup: appending output to ‘nohup.out’
    Verifying status of the database test_db
      ...Test #1
      ...Test #2
      ...Test #3
    The test_db database is up!
    
  • @Jim_Knicely It doesn't work. It went to 'echo ...Test #$c' loop for 40 times. and during that i tried to grep "vertica -D" on all nodes but didn't found the process. one thing that different from start via the admintools is i found startup.log with 0 byte.

    One more thing is we got a feedback from Vertica support. They've asked to try to running manually the catalog editor command "/opt/vertica/bin/vertica -D /xxxxx/xxxxx/xxxxx/v_xxxxx_node0001_catalog -E -z" but got an erorr "Segmentation fault".

    result:
    [dbadmin@xxxxx ~]$ /opt/vertica/bin/vertica -D /xxxxx/xxxxx/xxxxx/v_xxxxx_node0001_catalog -E
    Segmentation fault
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Interesting. What OS are you running? Can you try to run the catalog editor on another node?

  • @Jim_Knicely, we use CentOS Linux release 7.4.1708 (Core). I got the same error "Segmentation fault" when run the catalog editor on another nodes.

  • Are you willing to send support a core file? Also 'ldd /opt/vertica/bin/vertica' and the content of editor.log if it exists.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @Itipong_Chewinp - Per Ben's request, here is what I see:

    [dbadmin@s18384357 ~]$ ldd /opt/vertica/bin/vertica
            linux-vdso.so.1 =>  (0x00007fff8e7fc000)
            libgssapi_krb5.so.2 => /opt/vertica/bin/../lib/libgssapi_krb5.so.2 (0x00007f378053c000)
            libkrb5.so.3 => /opt/vertica/bin/../lib/libkrb5.so.3 (0x00007f3780444000)
            libkrb5support.so.0 => /opt/vertica/bin/../lib/libkrb5support.so.0 (0x00007f378042c000)
            libk5crypto.so.3 => /opt/vertica/bin/../lib/libk5crypto.so.3 (0x00007f37803e4000)
            libcom_err.so.3 => /opt/vertica/bin/../lib/libcom_err.so.3 (0x00007f378043c000)
            libvmalloc.so => /opt/vertica/bin/../lib/libvmalloc.so (0x00007f37803dc000)
            libcrypto.so.10 => /opt/vertica/bin/../lib/libcrypto.so.10 (0x00007f377ff94000)
            libssl.so.10 => /opt/vertica/bin/../lib/libssl.so.10 (0x00007f377fd24000)
            libAutopassCrypto64.so => /opt/vertica/bin/../lib/libAutopassCrypto64.so (0x00007f377fa8c000)
            liblmx64.so => /opt/vertica/bin/../lib/liblmx64.so (0x00007f377f8cc000)
            libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f377f5bc000)
            libm.so.6 => /lib64/libm.so.6 (0x00007f377f334000)
            libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f377f114000)
            librt.so.1 => /lib64/librt.so.1 (0x00007f377ef04000)
            libdl.so.2 => /lib64/libdl.so.2 (0x00007f377ecfc000)
            libc.so.6 => /lib64/libc.so.6 (0x00007f377e964000)
            /lib64/ld-linux-x86-64.so.2 (0x00007f378037c000)
            libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f377e744000)
            libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f377e53c000)
            libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f377e31c000)
            libz.so.1 => /lib64/libz.so.1 (0x00007f377e0fc000)
            libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f377def4000)
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file