The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.

Understand Why Vertica Node Went Down

Are there any guidelines to follow to understand why node went down?

Answers

  • moshegmosheg Employee

    See: https://www.vertica.com/blog/what-should-i-do-when-the-database-node-is-down
    And if needed please open a support case.
    In order to look for possible reason behind a crash support will probably guide you how to send a scrutinize log
    or request the vertica.log file messages from the time when node crashed.
    You can check for any Panic message or provide all the lines before the time of the node shutdown.
    Also provide "/var/log/messages" output of the same node just to see if anything unusual got reported there.

  • Thanks. Can you list steps on how to provide "/var/log/messages" output ?
    When I queried table error_messages, i saw 'Vertica suggests allowing 1 open file per MB of memory, minimum of 65536; see 'ulimit -n'’ before node went down.

  • SruthiASruthiA Employee

    /var/log/messages is a file. what is the output of ulimit -n?

  • SruthiA, the affected node shows 258284 for ulimit -n

  • I have checked output of /var/log/messages file and it does not have any relevant information related to why the node was down. Vertica log file has these entries prior to shutting node down:
    2019-08-27 11:58:17.711 EEcmdq:7f490cff9700 [Main] Handling signal: 11
    2019-08-27 11:58:18.000 DiskSpaceRefresher:7f4598e08700 [Util] Task 'DiskSpaceRefresher' enabled
    2019-08-27 11:58:18.103 EEcmdq:7f490cff9700 [Main] Received fatal signal SIGSEGV.
    2019-08-27 11:58:18.103 EEcmdq:7f490cff9700 [Main] Info: si_code: 1, si_pid: 93080, si_uid: 0, si_addr: 0x16b98
    2019-08-27 11:58:18.104 EEcmdq:7f490cff9700 @v_mydb_node0001: 01000/5439: Vertica suggests allowing 1 open file per MB of memory, minimum of 65536; see 'ulimit -n'
    No further entries appear after node was down.

    Investigating the issue more in dc_errors, I was able to identify transaction_id that is associated with above log error warning. The transaction refers to vertica process for building flex table. We have this process running fine for about 2 years now (btw, it ran fine today as well). I think that the error was the result of some other process that utilized too much memory so the flex table build resulted in error. Here are some other values reported in the dc_errors:
    error_level: 19
    line_number: 1012
    function_name: logWarnings
    message: Vertica suggests allowing 1 open file per MB of memory, minimum of 65536; see 'ulimit -n'
    error_code:64
    vertica_code:5439
    error_level_name:WARNING
    cursor_position:0
    I tried searching for the above error level and error_code/vertica_code numbers, but could not find any. Can someone help me out understanding what these mean?

  • SruthiASruthiA Employee

    Anuska, If possible, Could you please open a support case, we can dig into scrutinize and review it further.

  • Will do. Thanks for all your help.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.