Vertica is not starting up due to ulimit being insufficient
$ admintools -t start_db -d my_database -U
Info: no password specified, using none
Could not login with SSH to host 10.3.2.110
Starting nodes: v_my_database_node0001 (10.3.2.110) ***UNAVAILABLE*** v_my_database_node0002 (10.3.2.111) v_my_database_node0003 (10.3.2.112) Vertica names unique storage locations using a random algorithm. The operating system improves the randomness of its pseudo-random-number generator by gathering 'entropy' from the environment. Either some nodes currently have insufficient entropy to provide sufficient randomness (in which case, the problem is solved by waiting for sufficient entropy to accumulate, a process which should not take much more than a minute) or some error has prevented reading the entropy value from the node(s) (if this process doesn't make progress in a few minutes --- perhaps longer on larger clusters --- contact Vertica technical support). Starting Vertica on all nodes. Please wait, databases with large catalog may take a while to initialize. 10.3.2.111 failed. Result:
status=1 host=10.3.2.111 content={u'command_list': [u'/opt/vertica/bin/vertica', u'-D', u'/data/my_database/v_my_database_node0002_catalog', u'-C', u'my_database', u'-n', u'v_my_database_node0002', u'-h', u'10.3.2.111', u'-p', u'5433', u'-P', u'4803', u'-Y', u'ipv4', u'-U'], u'db_log_path': u'/data/my_database/dbLog', u'start_stdout': None, u'special_environment': None, u'return_code': 1, u'start_stderr': None}
Check /data/my_database/dbLog for more information.
Press RETURN to continue
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Node Status: v_my_database_node0001: (DOWN) v_my_database_node0002: (DOWN) v_my_database_node0003: (DOWN)
Error starting database, no nodes are up
Press RETURN to continue
Database my_database did not start successfully
We saw the below in startup.log
v_my_database_node0003_catalog]$ tail -f startup.log
{
"node" : "v_my_database_node0003",
"stage" : "Database Halted",
"text" : "Not enough open file handles allowed (1024 available/32768 required); see 'ulimit -n'.\n",
"timestamp" : "2020-10-28 10:43:03.353"
}
We did the below in /etc/security/limits.conf (followed by reboot)
- soft nofile 191918
- hard nofile 191918
We still saw the same error
When we get a fresh login using su -l -s /bin/bash, then we were able to see ulimit -n showing 191918 for root user.
Can somebody point us where is the problem here ? A reboot didn't help but a new session using su -l -s /bin/bash helps always. We are wondering what should be done to permanently get this fixed. Each time we shutdown vertica, we are not able to bring it back up without getting a new session using su -l -s /bin/bash
Answers
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/InstallationGuide/BeforeYouInstall/openfilelimits.htm
Hi Jim,
We set the hard limit and soft limit explicitly for the amount of RAM available in the system (191918MB) and performed reboot of the boxes. The problem is that without getting a new session using su -l -s /bin/bash we are not able to see the ulimit getting reflected. So we need to figure out why is this so and this is where we need some ideas.
Besides, we've noticed that ulimit has always been 191918 for dbadmin user but it was 1024 for root user. Is it required for root user as well to set the ulimit to higher number?
Thanks,
Karthik
The setting is for the dbadmin user as it's the one that owns the Vertica processes.
Just noticed the error at the beginning of your original post:
What node (IP) were you on when you ran the start_db command? ANd what user were you?
It was the dbadmin user and I was on node1. Note that after doing su -l -s /bin/bash and switching to dbadmin user this works
@karthik
What is the default shell for root **and **dbadmin? Should be bash.
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/InstallationGuide/BeforeYouInstall/BASHShellRequirements.htm?tocpath=Installing Vertica|Installing Manually|Before You Install Vertica|Platform Requirements and Recommendations|_____2
Though documentation references debian as an example, I have seen this applies to others also. At one customer the default was csh and Vertica was not happy. We had to change the sh link.
Hope this helps.