We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Cannot start database after upgrade vertica from 9.0.1-7 to 9.1.1 — Vertica Forum

Cannot start database after upgrade vertica from 9.0.1-7 to 9.1.1

We've tried to upgrade from 9.0.1-7 to 9.1.1. The upgrade task was very smooth. But we can't start DB after the upgrade. We got the error below in /opt/vertica/log/adminTools.log same error on all the nodes we have. And it doesn't have any things print to startup.log and vertica.log. BTW During the upgrade, we've opened a case to Vertica Support. They've tried to do some troubleshooting. Finally we decided to rollback to 9.0.1-7. And the database can start UP as usual. Any suggestions are welcome?

Regards,
BoMBaY

Error in /opt/vertica/log/adminTools.log about "Vertica Catalog Editor: broken pipe detected"

2018-08-16 18:47:54.827 at_exec/45050:0x7f0cbc6a9740 [root.setup_custom_logging] <INFO> New log for 'at_exec'
2018-08-16 18:47:54.827 at_exec/45050:0x7f0cbc6a9740 [root.setup_custom_logging] <INFO> sys.argv: '/opt/vertica/share/eggs/vertica/engine/api/at_runner.py' --no-prompt --module 'vertica.engine.api.compute_vdatabase'
2018-08-16 18:47:54.828 at_exec/45050:0x7f0cbc6a9740 [ATRunner._parse_command] <INFO> Reading a line from stdin...
2018-08-16 18:47:54.828 at_exec/45050:0x7f0cbc6a9740 [ATRunner._parse_command] <INFO> Reading complete: [#{u'__rollback__': '********', u'__protocol__': '********', u'catalogpath': u'/xxxxx/xxxxx/xxxxxx_nodexxxx_catalog'}]
2018-08-16 18:47:54.829 at_exec/45050:0x7f0cbc6a9740 [compute_vdatabase.__init__] <INFO> Initialized new instance of compute_vdatabase
2018-08-16 18:47:54.829 at_exec/45050:0x7f0cbc6a9740 [ATRunner.exec_module] <INFO> ATRunner exec_module: command: <ATCommand> module=None version=1.0 args={u'catalogpath': u'/xxxxx/xxxxx/xxxxxx_nodexxxx_catalog'}
2018-08-16 18:47:54.829 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor.__init__] <INFO> Running catalog editor command: ['/opt/vertica/bin/vertica', '-D', u'/xxxxx/xxxxx/xxxxxx_nodexxxx_catalog', '-E', '-z']
2018-08-16 18:47:54.831 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor._parse_header] <INFO> Parsing CE header
2018-08-16 18:47:54.831 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor._recorded_readline] <INFO> About to call readline on catalog editor output
2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor._recorded_readline] <INFO> Next line of response was ['']

2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor._censor_then_log] <INFO> Sending to catalog editor: ['get singleton Database name\n']
2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor.sendCmd] <ERROR> Exception encountered while running catalog editor
Traceback (most recent call last):
File "/opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/CatalogEditor.py", line 120, in sendCmd
self.ceproc.stdin.write(cmd_string)
IOError: [Errno 32] Broken pipe
2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [CatalogEditor.sendCmd] <ERROR>
Vertica Catalog Editor: broken pipe detected
Added stdout messages to error.
BEGIN CE output
END CE output
2018-08-16 18:47:54.857 at_exec/45050:0x7f0cbc6a9740 [ATRunner.exec_module] <ERROR> command got exception: Could not load from Catalog Editor.
Catalog Editor state
CatalogEditor instance
Closed? False
CE subprocess = <subprocess.Popen object at 0x7f0cb476ef50>
CE proc.poll = -11 CE proc.stdin = <open file '<fdopen>', mode 'wb' at 0x7f0cb4801c00> CE proc.stdout = <open file '<fdopen>', mode 'rb' at 0x7f0cb4801b70> Exception was Broken pipe
Vertica Catalog Editor: broken pipe detected
Added stdout messages to error.
BEGIN CE output
END CE output

Comments

  • Webex with Vertica Support summary:

    Upon starting DB the admintools just prints
    Unable to read database catalogs - cannot start database.

    Database did not start successfully

    Checked catalog directory, the directory owned by dbadmin; all files under it look good.
    permissions in catalog were
    drwxr-x--- 4 dbadmin verticadba 4096 Aug 16 10:46 Checkpoints
    drwxr-x--- 2 dbadmin verticadba 4096 Aug 16 10:46 Txnlogs

    I saw that in my cluster they are
    drwxrwx--- 4 dbadmin verticadba 4096 Aug 16 10:46 Checkpoints
    drwxrwx--- 2 dbadmin verticadba 4096 Aug 16 10:46 Txnlogs

    Asked them to change them to the same as mine and still the same error.

    ulimit -f says unlimited.

    limits.conf has:
    dbadmin - nice 0
    dbadmin - nofile 258200
    dbadmin - as unlimited
    dbadmin - fsize unlimited
    dbadmin - nproc 258202

    df -h shows plenty of space.

    Customer found that there is difference in the code for File "/opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/CatalogEditor.py", line 120, in sendCmd
    between the 9.1 and 9.0

    here is how it looks in tools:
    [dbadmin@xxxx-xxxx-xxx1 ~]$ cd /opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/
    [dbadmin@xxxx-xxxx-xxx1 tools]$ ls -lrt
    total 184
    -rw-rw-r-- 1 root root 1902 Jul 22 23:58 vioperf.py
    -rw-rw-r-- 1 root root 6270 Jul 22 23:58 vertica_key_mgmt.py
    -rw-rw-r-- 1 root root 2876 Jul 22 23:58 vcpuperf.py
    -rw-rw-r-- 1 root root 53 Jul 22 23:58 README
    -rw-rw-r-- 1 root root 5679 Jul 22 23:58 LogRotate.py
    -rw-rw-r-- 1 root root 3069 Jul 22 23:58 license_tool.py
    -rw-rw-r-- 1 root root 0 Jul 22 23:58 init.py
    -rw-rw-r-- 1 root root 9818 Jul 22 23:58 eula_checker.py
    -rw-rw-r-- 1 root root 29753 Jul 22 23:58 Diagnostics.py
    -rwxrwxr-x 1 root root 8643 Jul 22 23:58 DBfunctions.py
    -rw-rw-r-- 1 root root 10162 Jul 22 23:58 CatalogEditor.py
    -rwxrwxr-x 1 root root 22122 Jul 22 23:58 ATMain.py
    drwxrwxr-x 4 root root 4096 Aug 16 15:50 Scrutinize
    -rw-r--r-- 1 root root 164 Aug 16 15:52 init.pyc
    -rw-r--r-- 1 root root 11163 Aug 16 15:52 CatalogEditor.pyc

  • Webex Vertica Support Summary:

    Upon starting DB the admintools just prints
    Unable to read database catalogs - cannot start database.

    Database did not start successfully

    Checked catalog directory, the directory owned by dbadmin; all files under it look good.
    permissions in catalog were
    drwxr-x--- 4 dbadmin verticadba 4096 Aug 16 10:46 Checkpoints
    drwxr-x--- 2 dbadmin verticadba 4096 Aug 16 10:46 Txnlogs

    I saw that in my cluster they are
    drwxrwx--- 4 dbadmin verticadba 4096 Aug 16 10:46 Checkpoints
    drwxrwx--- 2 dbadmin verticadba 4096 Aug 16 10:46 Txnlogs

    Asked them to change them to the same as mine and still the same error.

    ulimit -f says unlimited.

    limits.conf has:
    dbadmin - nice 0
    dbadmin - nofile 258200
    dbadmin - as unlimited
    dbadmin - fsize unlimited
    dbadmin - nproc 258202

    df -h shows plenty of space.

    Customer found that there is difference in the code for File "/opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/CatalogEditor.py", line 120, in sendCmd
    between the 9.1 and 9.0

    here is how it looks in tools:
    [dbadmin@xxx-xxx-xxx1 ~]$ cd /opt/vertica/oss/python/lib/python2.7/site-packages/vertica/tools/
    [dbadmin@xxx-xxx-xxx1 tools]$ ls -lrt
    total 184
    -rw-rw-r-- 1 root root 1902 Jul 22 23:58 vioperf.py
    -rw-rw-r-- 1 root root 6270 Jul 22 23:58 vertica_key_mgmt.py
    -rw-rw-r-- 1 root root 2876 Jul 22 23:58 vcpuperf.py
    -rw-rw-r-- 1 root root 53 Jul 22 23:58 README
    -rw-rw-r-- 1 root root 5679 Jul 22 23:58 LogRotate.py
    -rw-rw-r-- 1 root root 3069 Jul 22 23:58 license_tool.py
    -rw-rw-r-- 1 root root 0 Jul 22 23:58 init.py
    -rw-rw-r-- 1 root root 9818 Jul 22 23:58 eula_checker.py
    -rw-rw-r-- 1 root root 29753 Jul 22 23:58 Diagnostics.py
    -rwxrwxr-x 1 root root 8643 Jul 22 23:58 DBfunctions.py
    -rw-rw-r-- 1 root root 10162 Jul 22 23:58 CatalogEditor.py
    -rwxrwxr-x 1 root root 22122 Jul 22 23:58 ATMain.py
    drwxrwxr-x 4 root root 4096 Aug 16 15:50 Scrutinize
    -rw-r--r-- 1 root root 164 Aug 16 15:52 init.pyc
    -rw-r--r-- 1 root root 11163 Aug 16 15:52 CatalogEditor.pyc

  • Can you post editor.log from the _catalog directory?

  • @Ben_Vandiver There is not things write to editor.log.

  • Today, we tried to upgrade from 9.0.1-7 to 9.1.0-5 and got the same issue. But we upgraded from 9.0.1-7 to 9.0.1-13 successfully. And then, we tried to upgrade from 9.0.1-13 to 9.1.0-5 and got the same issue. Now we rolled back to 9.0.1-13.

  • @Ben_Vandiver, If you can access Service Request, This Service Request ID is SD02266679.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited August 2018

    @Itipong_Chewinp - Did you check that all projection buddies in the current database comply with the new requirements of 9.1?

    See:
    https://my.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/NewFeatures/9.1/9.1.0/UpgradeandInstall.htm

    Did you run the pre-upgrade script?

  • @Jim_Knicely, Yes, we've run the pre-upgrade script. The pre-upgrade script output is below.


    Congratulations! No unsafe projections detected. Upgrade to 9.1 should succeed


  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited August 2018

    @Itipong_Chewinp - Hmm. After the upgrade you can try and start the DB manually (i.e. without admintools). I attached a zip file that includes a script that should start the nodes in your cluster manually. Do you have a test environment where you can test it?

    It is a bash shell script and you run it like the following example where "test_db" is the name of the DB I am trying to start:

    [dbadmin@vertica01 ~]$ ./manual_start_up.sh test_db
    Starting Vertica processes...
    nohup ssh 192.168.2.200 /opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0001_catalog -C test_db -n v_test_db_node0001 192.168.2.200 -p 5433 -P 4803 -Y ipv4 &
    nohup: appending output to ‘nohup.out’
    nohup ssh 192.168.2.201 /opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0002_catalog -C test_db -n v_test_db_node0002 192.168.2.201 -p 5433 -P 4803 -Y ipv4 &
    nohup: appending output to ‘nohup.out’
    nohup ssh 192.168.2.202 /opt/vertica/bin/vertica -D /home/dbadmin/test_db/v_test_db_node0003_catalog -C test_db -n v_test_db_node0003 192.168.2.202 -p 5433 -P 4803 -Y ipv4 &
    nohup: appending output to ‘nohup.out’
    Verifying status of the database test_db
      ...Test #1
      ...Test #2
      ...Test #3
    The test_db database is up!
    
  • @Jim_Knicely It doesn't work. It went to 'echo ...Test #$c' loop for 40 times. and during that i tried to grep "vertica -D" on all nodes but didn't found the process. one thing that different from start via the admintools is i found startup.log with 0 byte.

    One more thing is we got a feedback from Vertica support. They've asked to try to running manually the catalog editor command "/opt/vertica/bin/vertica -D /xxxxx/xxxxx/xxxxx/v_xxxxx_node0001_catalog -E -z" but got an erorr "Segmentation fault".

    result:
    [dbadmin@xxxxx ~]$ /opt/vertica/bin/vertica -D /xxxxx/xxxxx/xxxxx/v_xxxxx_node0001_catalog -E
    Segmentation fault
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Interesting. What OS are you running? Can you try to run the catalog editor on another node?

  • @Jim_Knicely, we use CentOS Linux release 7.4.1708 (Core). I got the same error "Segmentation fault" when run the catalog editor on another nodes.

  • Are you willing to send support a core file? Also 'ldd /opt/vertica/bin/vertica' and the content of editor.log if it exists.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @Itipong_Chewinp - Per Ben's request, here is what I see:

    [dbadmin@s18384357 ~]$ ldd /opt/vertica/bin/vertica
            linux-vdso.so.1 =>  (0x00007fff8e7fc000)
            libgssapi_krb5.so.2 => /opt/vertica/bin/../lib/libgssapi_krb5.so.2 (0x00007f378053c000)
            libkrb5.so.3 => /opt/vertica/bin/../lib/libkrb5.so.3 (0x00007f3780444000)
            libkrb5support.so.0 => /opt/vertica/bin/../lib/libkrb5support.so.0 (0x00007f378042c000)
            libk5crypto.so.3 => /opt/vertica/bin/../lib/libk5crypto.so.3 (0x00007f37803e4000)
            libcom_err.so.3 => /opt/vertica/bin/../lib/libcom_err.so.3 (0x00007f378043c000)
            libvmalloc.so => /opt/vertica/bin/../lib/libvmalloc.so (0x00007f37803dc000)
            libcrypto.so.10 => /opt/vertica/bin/../lib/libcrypto.so.10 (0x00007f377ff94000)
            libssl.so.10 => /opt/vertica/bin/../lib/libssl.so.10 (0x00007f377fd24000)
            libAutopassCrypto64.so => /opt/vertica/bin/../lib/libAutopassCrypto64.so (0x00007f377fa8c000)
            liblmx64.so => /opt/vertica/bin/../lib/liblmx64.so (0x00007f377f8cc000)
            libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f377f5bc000)
            libm.so.6 => /lib64/libm.so.6 (0x00007f377f334000)
            libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f377f114000)
            librt.so.1 => /lib64/librt.so.1 (0x00007f377ef04000)
            libdl.so.2 => /lib64/libdl.so.2 (0x00007f377ecfc000)
            libc.so.6 => /lib64/libc.so.6 (0x00007f377e964000)
            /lib64/ld-linux-x86-64.so.2 (0x00007f378037c000)
            libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f377e744000)
            libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f377e53c000)
            libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f377e31c000)
            libz.so.1 => /lib64/libz.so.1 (0x00007f377e0fc000)
            libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f377def4000)
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file