loss of node will cause shutdown

Hi, i am a newbie in Vertica, and i come across an issue like this -- i supposed installed properly on single node on SuSE 10.4 (x86_64), and when i started database, the db will be shut several minutes later. and i got this message from MC, that is "Loss of node v_test_node0001 will cause shutdown to occur. K=0 total number of nodes=1".
Any advice or suggestions will be much appreciated.

Comments

  • Abhishek_RanaAbhishek_Rana Vertica Employee Employee
    Hi,

    A single node cluster is never k-safe. K-safety means availability of database even in case some of nodes of cluster goes down. But in your case as your database is single node , so it is always at k-safety level K=0, so if a node gets down because of any problem, your database will automatically shutdown.

    Vertica recommends K-safety level of 1 (k=1) , which will require at least 3 node cluster. K=1 means vertica is keeping a duplicate copy of vertica projections means data & those copies are used in case a node goes down for recovery of the node OR same copy is accessible by users & which keeps database working.


  • Hi Abhishek, Many thanks for your reply! actually i checked the "restart policy" by using admintool, and should i set it "never" instead of "k-safe" since i have a single node cluster, right? however i still got the problem that database went down after i did the change and restart the server. here is some log from the "vertica.log", could you please help me to locate the root cause of it? thank you in advance. 2013-11-21 12:23:03.955 Init Session:0x4a2cbb0 @v_test_node0001: 00000/2705: Connection received: host=127.0.0.1 port=41336 (connCnt 1) 2013-11-21 12:23:03.955 Init Session:0x4a2cbb0 @v_test_node0001: 00000/4540: Received SSL negotiation startup packet 2013-11-21 12:23:03.955 Init Session:0x4a2cbb0 @v_test_node0001: 00000/4691: Sending SSL negotiation response 'N' 2013-11-21 12:23:03.955 Init Session:0x4a2cbb0 @v_test_node0001: 00000/4686: sendAuthRequest: user=dbadmin database=dbadmin host=127.0.0.1 authType=0 2013-11-21 12:23:03.955 Init Session:0x4a2cbb0 @v_test_node0001: 00000/2703: Connection authenticated: user=dbadmin database=dbadmin host=127.0.0.1 2013-11-21 12:23:03.956 Init Session:0x4a2cbb0 @v_test_node0001: 00000/2609: Client pid: 5924 2013-11-21 12:23:09.159 Init Session:0x4a2cbb0 [Session] [Query] TX:0(SLES10-5514:0x3a) select * from nodes; 2013-11-21 12:23:09.160 Init Session:0x4a2cbb0-a000000000013a [Txn] Begin Txn: a000000000013a 'select * from nodes;' 2013-11-21 12:23:39.015 LowDiskSpaceCheck:0x4c04dd0-a000000000013c [Txn] Begin Txn: a000000000013c 'LowDiskSpaceCheck' 2013-11-21 12:23:39.016 LowDiskSpaceCheck:0x4c04dd0-a000000000013c [Main] Handling signal: 8 2013-11-21 12:23:39.020 AnalyzeRowCount:0x5f329c0-a000000000013d [Txn] Begin Txn: a000000000013d 'getRowCountsForProj' 2013-11-21 12:23:39.022 AnalyzeRowCount:0x5f329c0-a000000000013d [Txn] Rollback Txn: a000000000013d 'getRowCountsForProj' 2013-11-21 12:23:39.026 AnalyzeRowCount:0x5f329c0 [Util] Task 'AnalyzeRowCount' enabled 2013-11-21 12:23:39.294 LowDiskSpaceCheck:0x4c04dd0-a000000000013c [Main] Received fatal signal SIGFPE. 2013-11-21 12:23:39.294 LowDiskSpaceCheck:0x4c04dd0-a000000000013c [Main] Info: si_code: 1, si_pid: 47311191, si_uid: 0, si_addr: 0x2d1e957
  • Hi Phish,

    Hm, that's very interesting...  Your process is receiving a SIGFPE; that's an unhandled division-by-zero error.

    Is there also a file "ErrorReport.txt" alongside the log?  If so, what's in it?

    Also, is there anything in particular that you do that tends to cause the database to shut down?  (It doesn't look like it from this log snippet...)

    Adam
  • Hi Adam, thanks for your reply! i installed vertica redhat version on a centOS virtual server and it runs well now. after checked the vertica.log and there is no PANIC information appears. i just started db and run some random sql--like "select 1+2 from dual" or something, several minutes later the db just went down itself..i can attach part of the ErrorReport.txt from former installation on suse 10.4, like this: Backtrace Generated by Error Signal: [0x0000000000000008] PID: [0x0000000000001625] PC: [0x0000000002d1e957] FP: [0x0000000047008110]SI_ADDR : [0x0000000002d1e957] /opt/vertica/bin/vertica(_ZN6Basics9Backtrace11DoBacktraceEiiPvS1_+0x88c)[0x2ad2f7c] /opt/vertica/bin/vertica(_ZN6Basics20GlobalSignalHandlers14logFatalSignalEiPvS1_+0xb6)[0x2b584da] /opt/vertica/bin/vertica[0x2b597b5] /lib64/libc.so.6[0x2b8576e76fc0] /opt/vertica/bin/vertica(_ZN3Mon29CheckForLowDiskSpaceTimerTask15runTaskInternalEb+0x457)[0x2d1e957] /opt/vertica/bin/vertica(_ZN4Util16TimerServiceTask7runTaskEb+0x4b)[0x2ccec2f] /opt/vertica/bin/vertica(_ZN4Util20TimerServiceTaskList18TaskSchedulingInfo10threadShimEPv+0x125)[0x2cd393d] /opt/vertica/bin/vertica(_ZNK5boost9function0IvEclEv+0x1be)[0x15b832e] /opt/vertica/bin/vertica(_ZN7Session13ThreadManager12launchThreadERKN5boost9function0IvEE+0x57)[0x15b40db] /opt/vertica/bin/vertica(thread_proxy+0x80)[0x349a850] /lib64/libpthread.so.0[0x2b8577192193] /lib64/libc.so.6(__clone+0x6d)[0x2b8576f080dd] END BACKTRACE
  • Hi, after i reinsalled vertica with redhat version in centos it runs well several days. may i know why this "SIGFPE" problem occured and how to avoid such issues in the future? or is there any OS version mandatory requirement while installing vertica or something? thanks in advance.
  • I am having this exact issue on Debian. What is the solution here? Mine shuts down with the same message after about a minute of running.
  • Here is another stacktrace from ErrorReport.txt
    BEGIN BACKTRACE Vertica Backtrace at Fri Feb  7 13:34:05 2014 ------------------------- Vertica Analytic Database v7.0.0-1 $BrandId$ vertica(v7.0.0-1) built by release@build2.verticacorp.com from releases/VER_7_0_RELEASE_BUILD_0_1_20131219@127765 on 'Thu Dec 19  8:48:04 America/New_York 2013' $Buil dId$ 00400000-04944000 r-xp 00000000 00:15 92085                              /opt/vertica/bin/vertica 04b44000-04dce000 rw-p 04544000 00:15 92085                              /opt/vertica/bin/vertica 04dce000-04edc000 rw-p 00000000 00:00 0  058c3000-05cf5000 rw-p 00000000 00:00 0                                  [heap] 7f6a2e6f3000-7f6a2f638000 r--s 00000000 00:15 93047                      /opt/vertica/share/icu/icudt42l.dat 7f6a2f638000-7f6a3f798000 rw-p 00000000 00:00 0  7f6a3f798000-7f6a3fb98000 rw-p 00000000 00:00 0  7f6a3fb98000-7f6a3fba4000 r-xp 00000000 00:15 122288                     /lib/x86_64-linux-gnu/libnss_files-2.17.so 7f6a3fba4000-7f6a3fda3000 ---p 0000c000 00:15 122288                     /lib/x86_64-linux-gnu/libnss_files-2.17.so 7f6a3fda3000-7f6a3fda4000 r--p 0000b000 00:15 122288                     /lib/x86_64-linux-gnu/libnss_files-2.17.so 7f6a3fda4000-7f6a3fda5000 rw-p 0000c000 00:15 122288                     /lib/x86_64-linux-gnu/libnss_files-2.17.so 7f6a3fda5000-7f6a3fdaf000 r-xp 00000000 00:15 122286                     /lib/x86_64-linux-gnu/libnss_nis-2.17.so 7f6a3fdaf000-7f6a3ffae000 ---p 0000a000 00:15 122286                     /lib/x86_64-linux-gnu/libnss_nis-2.17.so 7f6a3ffae000-7f6a3ffaf000 r--p 00009000 00:15 122286                     /lib/x86_64-linux-gnu/libnss_nis-2.17.so 7f6a3ffaf000-7f6a3ffb0000 rw-p 0000a000 00:15 122286                     /lib/x86_64-linux-gnu/libnss_nis-2.17.so 7f6a3ffb0000-7f6a3ffc5000 r-xp 00000000 00:15 122299                     /lib/x86_64-linux-gnu/libnsl-2.17.so 7f6a3ffc5000-7f6a401c4000 ---p 00015000 00:15 122299                     /lib/x86_64-linux-gnu/libnsl-2.17.so 7f6a401c4000-7f6a401c5000 r--p 00014000 00:15 122299                     /lib/x86_64-linux-gnu/libnsl-2.17.so 7f6a401c5000-7f6a401c6000 rw-p 00015000 00:15 122299                     /lib/x86_64-linux-gnu/libnsl-2.17.so 7f6a401c6000-7f6a401c8000 rw-p 00000000 00:00 0  7f6a401c8000-7f6a401cf000 r-xp 00000000 00:15 122285                     /lib/x86_64-linux-gnu/libnss_compat-2.17.so 7f6a401cf000-7f6a403ce000 ---p 00007000 00:15 122285                     /lib/x86_64-linux-gnu/libnss_compat-2.17.so 7f6a403ce000-7f6a403cf000 r--p 00006000 00:15 122285                     /lib/x86_64-linux-gnu/libnss_compat-2.17.so 7f6a403cf000-7f6a403d0000 rw-p 00007000 00:15 122285                     /lib/x86_64-linux-gnu/libnss_compat-2.17.so 7f6a403d0000-7f6a403f5000 r-xp 00000000 00:15 47737                      /lib/x86_64-linux-gnu/libtinfo.so.5.9 7f6a403f5000-7f6a405f5000 ---p 00025000 00:15 47737                      /lib/x86_64-linux-gnu/libtinfo.so.5.9 7f6a405f5000-7f6a405f9000 r--p 00025000 00:15 47737                      /lib/x86_64-linux-gnu/libtinfo.so.5.9 7f6a405f9000-7f6a405fa000 rw-p 00029000 00:15 47737                      /lib/x86_64-linux-gnu/libtinfo.so.5.9 7f6a405fa000-7f6a4060f000 r-xp 00000000 00:15 122736                     /lib/x86_64-linux-gnu/libgcc_s.so.1 7f6a4060f000-7f6a4080f000 ---p 00015000 00:15 122736                     /lib/x86_64-linux-gnu/libgcc_s.so.1 7f6a4080f000-7f6a40810000 rw-p 00015000 00:15 122736                     /lib/x86_64-linux-gnu/libgcc_s.so.1 7f6a40810000-7f6a40827000 r-xp 00000000 00:15 122284                     /lib/x86_64-linux-gnu/libpthread-2.17.so 7f6a40827000-7f6a40a26000 ---p 00017000 00:15 122284                     /lib/x86_64-linux-gnu/libpthread-2.17.so 7f6a40a26000-7f6a40a27000 r--p 00016000 00:15 122284                     /lib/x86_64-linux-gnu/libpthread-2.17.so 7f6a40a27000-7f6a40a28000 rw-p 00017000 00:15 122284                     /lib/x86_64-linux-gnu/libpthread-2.17.so 7f6a40a28000-7f6a40a2c000 rw-p 00000000 00:00 0 7f6a40a2c000-7f6a40a2f000 r-xp 00000000 00:15 122290                     /lib/x86_64-linux-gnu/libdl-2.17.so 7f6a40a2f000-7f6a40c2e000 ---p 00003000 00:15 122290                     /lib/x86_64-linux-gnu/libdl-2.17.so 7f6a40c2e000-7f6a40c2f000 r--p 00002000 00:15 122290                     /lib/x86_64-linux-gnu/libdl-2.17.so 7f6a40c2f000-7f6a40c30000 rw-p 00003000 00:15 122290                     /lib/x86_64-linux-gnu/libdl-2.17.so 7f6a40c30000-7f6a40dd3000 r-xp 00000000 00:15 122291                     /lib/x86_64-linux-gnu/libc-2.17.so 7f6a40dd3000-7f6a40fd2000 ---p 001a3000 00:15 122291                     /lib/x86_64-linux-gnu/libc-2.17.so 7f6a40fd2000-7f6a40fd6000 r--p 001a2000 00:15 122291                     /lib/x86_64-linux-gnu/libc-2.17.so 7f6a40fd6000-7f6a40fd8000 rw-p 001a6000 00:15 122291                     /lib/x86_64-linux-gnu/libc-2.17.so 7f6a40fd8000-7f6a40fdc000 rw-p 00000000 00:00 0  7f6a40fdc000-7f6a40fe3000 r-xp 00000000 00:15 122279                     /lib/x86_64-linux-gnu/librt-2.17.so 7f6a40fe3000-7f6a411e2000 ---p 00007000 00:15 122279                     /lib/x86_64-linux-gnu/librt-2.17.so 7f6a411e2000-7f6a411e3000 r--p 00006000 00:15 122279                     /lib/x86_64-linux-gnu/librt-2.17.so 7f6a411e3000-7f6a411e4000 rw-p 00007000 00:15 122279                     /lib/x86_64-linux-gnu/librt-2.17.so 7f6a411e4000-7f6a41207000 r-xp 00000000 00:15 52222                      /lib/x86_64-linux-gnu/libncurses.so.5.9 7f6a41207000-7f6a41406000 ---p 00023000 00:15 52222                      /lib/x86_64-linux-gnu/libncurses.so.5.9 7f6a41406000-7f6a41407000 r--p 00022000 00:15 52222                      /lib/x86_64-linux-gnu/libncurses.so.5.9 7f6a41407000-7f6a41408000 rw-p 00023000 00:15 52222                      /lib/x86_64-linux-gnu/libncurses.so.5.9 7f6a41408000-7f6a41429000 r-xp 00000000 00:15 122282                     /lib/x86_64-linux-gnu/ld-2.17.so 7f6a41478000-7f6a41601000 r--p 00000000 00:15 71118                      /usr/lib/locale/locale-archive 7f6a41601000-7f6a41622000 rw-p 00000000 00:00 0  7f6a41623000-7f6a41629000 rw-p 00000000 00:00 0  7f6a41629000-7f6a4162a000 r--p 00021000 00:15 122282                     /lib/x86_64-linux-gnu/ld-2.17.so 7f6a4162a000-7f6a4162c000 rw-p 00022000 00:15 122282                     /lib/x86_64-linux-gnu/ld-2.17.so 7fff7a1b1000-7fff7a1d2000 rw-p 00000000 00:00 0                          [stack] 7fff7a1fe000-7fff7a200000 r-xp 00000000 00:00 0                          [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]  Backtrace Generated by Error Signal: [0x0000000000000008] PID: [0x00000000000028c3] PC: [0x00000000029fe0a5] FP: [0x00007f69bd7f4c20]SI_ADDR : [0x00000000029fe0a5] /opt/vertica/bin/vertica(_ZN6Basics9Backtrace11DoBacktraceEiiPvS1_+0x8cc)[0x3370fee] /opt/vertica/bin/vertica(_ZN6Basics20GlobalSignalHandlers14logFatalSignalEiPvS1_+0xc7)[0x33f0205] /opt/vertica/bin/vertica[0x33f07f3] /lib/x86_64-linux-gnu/libc.so.6(+0x35250)[0x7f6a40c65250] /opt/vertica/bin/vertica(_ZN3Mon29CheckForLowDiskSpaceTimerTask15runTaskInternalEb+0x495)[0x29fe0a5] /opt/vertica/bin/vertica(_ZN4Util16TimerServiceTask7runTaskEb+0x4b)[0x3550d63] /opt/vertica/bin/vertica(_ZN4Util20TimerServiceTaskList18TaskSchedulingInfo10threadShimEPv+0x215)[0x3556b1b] /opt/vertica/bin/vertica(_ZNK5boost9function0IvEclEv+0x1bb)[0x13459fb] /opt/vertica/bin/vertica(_ZN7Session13ThreadManager12launchThreadERKN5boost9function0IvEE+0x57)[0x32478ef] /opt/vertica/bin/vertica(thread_proxy+0x80)[0x39e69f0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e0e)[0x7f6a40817e0e] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6a40d190fd] END BACKTRACE THREAD CONTEXT Thread type: Timer Thread Request: Unknown request Transaction: [0x00a0000000000115] END THREAD CONTEXT  
  • Just did a fresh install, on a completely vanilla Debian 7 instance. Still the same thing...
  • Yet another fresh install, this time with Debian 6. Exact same issue. Some help would be greatly appreciated. I am trying to set up Vertica for an evaluation.
  • Hi Chris,

    Are you installing on actual hardware or a VM? Are you running any queries before it goes down, or does itgo down whether you have a VSQL session running or not?
  • I'm running in an LXC Container. No queries are run, it just goes down without touching it. I just tried on an ubuntu 12.04 LTS container with the same result. I will try it in a VM next in case it is LXC complicating things. 
  • It seems the VBox Image worked fine. It must be a problem with the LXC container...
  • Great! Let us know if you see the same issue with the VM image.  I don't believe we support LXC even for evaluation purposes. 
  • I believe that the container is reporting some interesting values for filesystem size and/or inode count.  Could you post the output of 'df' and 'df -i'?
  • # df
    Filesystem     1K-blocks     Used Available Use% Mounted on
    /dev/sdc1      488385536 21028164 465823788   5% /
    tmpfs            1642984       36   1642948   1% /run
    tmpfs               5120        0      5120   0% /run/lock
    tmpfs            6623980        0   6623980   0% /run/shm

    # df -i
    Filesystem      Inodes IUsed   IFree IUse% Mounted on
    /dev/sdc1            0     0       0     - /
    tmpfs          2053726    24 2053702    1% /run
    tmpfs          2053726     1 2053725    1% /run/lock
    tmpfs          2053726     2 2053724    1% /run/shm




  • That's it - your /dev/sdc1 is claiming to have 0 inodes available or free.  Given that lack of sufficient inodes presents as out of disk space, our low disk space checker looks at inodes.  Clearly I need to add a special case for this situation.  Thanks for your help!
  • That makes sense, I never even noticed. Glad I could help, I hope in the future there is a possibility to run in an LXC, even if that is in the form of a more relevant error message. Cheers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file