Unable to Create Database - Nodes won't come up
I'm trying to get Vertica installed on a 3 node cluster and the installation seems to go fine, but when I run the create_db only 1 node comes up. I see this over and over but I'm not sure how to troubleshoot this. Any suggestions on how to get this working?
[dbadmin@ip-10-232-132-8 root]$ /opt/vertica/bin/adminTools -t create_db --hosts 10.232.132.8,10.232.132.9,10.232.132.5 -d test_db -p ofi_db -l /tmp/license.key
Distributing changes to cluster.
Creating database test_db
Starting bootstrap node v_test_db_node0001 (10.232.132.8)
Starting nodes:
v_test_db_node0001 (10.232.132.8)
Some nodes have insufficient entropy for use in definition of unique storage locations. It may take a while for them to get the entropy they need.
Starting Vertica on all nodes. Please wait, databases with large catalog may take a while to initialize.
Node Status: v_test_db_node0001: (DOWN)
Node Status: v_test_db_node0001: (DOWN)
Node Status: v_test_db_node0001: (DOWN)
Node Status: v_test_db_node0001: (DOWN)
Node Status: v_test_db_node0001: (UP)
Creating database nodes
Creating node v_test_db_node0002 (host 10.232.132.9)
Creating node v_test_db_node0003 (host 10.232.132.5)
Generating new configuration information
Starting all nodes
Starting nodes:
v_test_db_node0002 (10.232.132.9)
v_test_db_node0003 (10.232.132.5)
Starting Vertica on all nodes. Please wait, databases with large catalog may take a while to initialize.
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Nodes UP: v_test_db_node0001
Nodes DOWN: v_test_db_node0003, v_test_db_node0002 (may be still initializing).
It is suggested that you continue waiting.
Do you want to continue waiting? (yes/no) [yes] yes
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Comments
To debug, check /opt/vertica/logs/adminTools.log on node0001 and you can check startup.log,vertica.log in catalog directory on node0002 and node0003 .
No errors in the adminTools log:
2017-02-03 18:14:39.230 at_exec/199464:0x7fb0d4a1d740 [ATRunner._parse_command] Reading complete
2017-02-03 18:14:39.232 at_exec/199464:0x7fb0d4a1d740 [ATRunner.exec_module] ATRunner exec_module: command: module=None version=1.0 args={u'catalogpath': u'/opt/vertica/data/test_db/v_test_db_node0002_catalog'}
2017-02-03 18:14:39.232 at_exec/199464:0x7fb0d4a1d740 [vertica_process_is_up.run_in_subprocess] run_in_subprocess: ['/opt/vertica/bin/vertica', '--status', '-D', u'/opt/vertica/data/test_db/v_test_db_node0002_catalog']None
2017-02-03 18:14:39.321 at_exec/199464:0x7fb0d4a1d740 [vertica_process_is_up.run_in_subprocess] run_in_subprocess: ['ps', '-C', 'vertica', '-o', 'args']None
2017-02-03 18:14:39.656 at_exec/199469:0x7f4fe17da740 [root.setup_custom_logging] New log for 'at_exec'
2017-02-03 18:14:39.657 at_exec/199469:0x7f4fe17da740 [root.setup_custom_logging] sys.argv: '/opt/vertica/share/eggs/vertica/engine/api/at_runner.py' '--module=vertica.engine.api.start_database'
2017-02-03 18:14:39.657 at_exec/199469:0x7f4fe17da740 [ATRunner._parse_command] Reading a line from stdin...
2017-02-03 18:14:39.759 at_exec/199469:0x7f4fe17da740 [ATRunner._parse_command] Reading complete
2017-02-03 18:14:39.761 at_exec/199469:0x7f4fe17da740 [ATRunner.exec_module] ATRunner exec_module: command: module=None version=1.0 args={u'node': {u'name': u'v_test_db_node0002', u'storagelocs': [u'/opt/vertica/data/test_db/v_test_db_node0002_data'], u'oid': u'45035996273721876', u'host': u'10.232.132.9', u'catalogpath': u'/opt/vertica/data/test_db/v_test_db_node0002_catalog', u'controlnode': u'45035996273721876', u'startcmd': u'"/opt/vertica/bin/vertica" "-D" "/opt/vertica/data/test_db/v_test_db_node0002_catalog" "-C" "test_db" "-n" "v_test_db_node0002" "-h" "10.232.132.9" "-p" "5433" "-P" "4803" "-Y" "ipv4"', u'port': u'5433'}, u'last_epoch': None, u'unsafe_startup': False, u'special_environment': None, u'delete_corrupted_data': False}
2017-02-03 18:14:39.761 at_exec/199469:0x7f4fe17da740 [start_database.run_in_subprocess] run_in_subprocess: [u'/opt/vertica/bin/vertica', u'-D', u'/opt/vertica/data/test_db/v_test_db_node0002_catalog', u'-C', u'test_db', u'-n', u'v_test_db_node0002', u'-h', u'10.232.132.9', u'-p', u'5433', u'-P', u'4803', u'-Y', u'ipv4']None
Startup.log I'm not sure how to interpret
[root@ip-10-232-132-9 v_test_db_node0002_catalog]# cat startup.log
{
"node" : "v_test_db_node0002",
"stage" : "Connecting to Spread",
"text" : "Connecting to spread /opt/vertica/spread/tmp/4803",
"timestamp" : "2017-02-03 18:14:39.908"
}
{
"node" : "v_test_db_node0002",
"stage" : "Waiting for Cluster Invite",
"text" : "Prepare to be invited",
"timestamp" : "2017-02-03 18:14:40.001"
}
{
"node" : "v_test_db_node0002",
"stage" : "Waiting for Cluster Invite",
"text" : "Prepare to be invited",
"timestamp" : "2017-02-03 18:14:42.000"
}
{
"node" : "v_test_db_node0002",
"stage" : "Waiting for Cluster Invite",
"text" : "Prepare to be invited",
"timestamp" : "2017-02-03 18:14:44.000"
}
{
"node" : "v_test_db_node0002",
"stage" : "Waiting for Cluster Invite",
"text" : "Prepare to be invited",
"timestamp" : "2017-02-03 18:14:46.000"
}
{
"node" : "v_test_db_node0002",
"stage" : "Waiting for Cluster Invite",
"text" : "Prepare to be invited",
"timestamp" : "2017-02-03 18:14:48.000"
}
{
"node" : "v_test_db_node0002",
"stage" : "Waiting for Cluster Invite",
"text" : "Ready to be invited",
"timestamp" : "2017-02-03 18:14:48.909"
}
and Vertica.log shows this:
2017-02-03 18:23:34.000 DiskSpaceRefresher:7f16b17ca700 [Catalog] getLocalStorageLocations: no local node
2017-02-03 18:23:34.000 DiskSpaceRefresher:7f16b17ca700 [Util] Task 'DiskSpaceRefresher' enabled
2017-02-03 18:23:34.001 nameless:7f16c37fe700 [Catalog] getLocalStorageLocations: no local node
2017-02-03 18:23:35.001 nameless:7f16c37fe700 [Catalog] getLocalStorageLocations: no local node
2017-02-03 18:23:36.001 nameless:7f16c37fe700 [Catalog] getLocalStorageLocations: no local node
2017-02-03 18:23:37.001 nameless:7f16c37fe700 [Catalog] getLocalStorageLocations: no local node
[root@ip-10-232-132-5 v_test_db_node0003_catalog]#
I'm in AWS and going to try the install again using the --point-to-point flag
Found the issue. Security groups were messed up. Only had TCP for DNS and was missing UDP for spread.
This prior thread pointed me in the right direction. https://forum.vertica.com/discussion/comment/237056/#Comment_237056
and the official documentation was helpful as well. https://my.vertica.com/docs/Ecosystem/Amazon/UsingVerticaonAWSHTML/Default.htm#Authoring/UsingVerticaOnAWS/CreatingASecurityGroup.htm?TocPath=Configure%20Your%20Network|_____7