Are there any guidelines to follow to understand why node went down?
And if needed please open a support case.
In order to look for possible reason behind a crash support will probably guide you how to send a scrutinize log
or request the vertica.log file messages from the time when node crashed.
You can check for any Panic message or provide all the lines before the time of the node shutdown.
Also provide "/var/log/messages" output of the same node just to see if anything unusual got reported there.
Thanks. Can you list steps on how to provide "/var/log/messages" output ?
When I queried table error_messages, i saw 'Vertica suggests allowing 1 open file per MB of memory, minimum of 65536; see 'ulimit -n'’ before node went down.
/var/log/messages is a file. what is the output of ulimit -n?
SruthiA, the affected node shows 258284 for ulimit -n
I have checked output of /var/log/messages file and it does not have any relevant information related to why the node was down. Vertica log file has these entries prior to shutting node down:
2019-08-27 11:58:17.711 EEcmdq:7f490cff9700 [Main] Handling signal: 11
2019-08-27 11:58:18.000 DiskSpaceRefresher:7f4598e08700 [Util] Task 'DiskSpaceRefresher' enabled
2019-08-27 11:58:18.103 EEcmdq:7f490cff9700 [Main] Received fatal signal SIGSEGV.
2019-08-27 11:58:18.103 EEcmdq:7f490cff9700 [Main] Info: si_code: 1, si_pid: 93080, si_uid: 0, si_addr: 0x16b98
2019-08-27 11:58:18.104 EEcmdq:7f490cff9700 @v_mydb_node0001: 01000/5439: Vertica suggests allowing 1 open file per MB of memory, minimum of 65536; see 'ulimit -n'
No further entries appear after node was down.
Investigating the issue more in dc_errors, I was able to identify transaction_id that is associated with above log error warning. The transaction refers to vertica process for building flex table. We have this process running fine for about 2 years now (btw, it ran fine today as well). I think that the error was the result of some other process that utilized too much memory so the flex table build resulted in error. Here are some other values reported in the dc_errors:
message: Vertica suggests allowing 1 open file per MB of memory, minimum of 65536; see 'ulimit -n'
I tried searching for the above error level and error_code/vertica_code numbers, but could not find any. Can someone help me out understanding what these mean?
Anuska, If possible, Could you please open a support case, we can dig into scrutinize and review it further.
Will do. Thanks for all your help.
Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.