Options

Unable to Create Database - Nodes won't come up

I'm trying to get Vertica installed on a 3 node cluster and the installation seems to go fine, but when I run the create_db only 1 node comes up. I see this over and over but I'm not sure how to troubleshoot this. Any suggestions on how to get this working?

[dbadmin@ip-10-232-132-8 root]$ /opt/vertica/bin/adminTools -t create_db --hosts 10.232.132.8,10.232.132.9,10.232.132.5 -d test_db -p ofi_db -l /tmp/license.key
Distributing changes to cluster.
Creating database test_db
Starting bootstrap node v_test_db_node0001 (10.232.132.8)
Starting nodes:
v_test_db_node0001 (10.232.132.8)
Some nodes have insufficient entropy for use in definition of unique storage locations. It may take a while for them to get the entropy they need.
Starting Vertica on all nodes. Please wait, databases with large catalog may take a while to initialize.
Node Status: v_test_db_node0001: (DOWN)
Node Status: v_test_db_node0001: (DOWN)
Node Status: v_test_db_node0001: (DOWN)
Node Status: v_test_db_node0001: (DOWN)
Node Status: v_test_db_node0001: (UP)
Creating database nodes
Creating node v_test_db_node0002 (host 10.232.132.9)
Creating node v_test_db_node0003 (host 10.232.132.5)
Generating new configuration information
Starting all nodes
Starting nodes:
v_test_db_node0002 (10.232.132.9)
v_test_db_node0003 (10.232.132.5)
Starting Vertica on all nodes. Please wait, databases with large catalog may take a while to initialize.
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Nodes UP: v_test_db_node0001
Nodes DOWN: v_test_db_node0003, v_test_db_node0002 (may be still initializing).
It is suggested that you continue waiting.
Do you want to continue waiting? (yes/no) [yes] yes
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)
Node Status: v_test_db_node0001: (UP) v_test_db_node0002: (DOWN) v_test_db_node0003: (DOWN)

Comments

  • Options

    To debug, check /opt/vertica/logs/adminTools.log on node0001 and you can check startup.log,vertica.log in catalog directory on node0002 and node0003 .

  • Options

    No errors in the adminTools log:
    2017-02-03 18:14:39.230 at_exec/199464:0x7fb0d4a1d740 [ATRunner._parse_command] Reading complete
    2017-02-03 18:14:39.232 at_exec/199464:0x7fb0d4a1d740 [ATRunner.exec_module] ATRunner exec_module: command: module=None version=1.0 args={u'catalogpath': u'/opt/vertica/data/test_db/v_test_db_node0002_catalog'}
    2017-02-03 18:14:39.232 at_exec/199464:0x7fb0d4a1d740 [vertica_process_is_up.run_in_subprocess] run_in_subprocess: ['/opt/vertica/bin/vertica', '--status', '-D', u'/opt/vertica/data/test_db/v_test_db_node0002_catalog']None
    2017-02-03 18:14:39.321 at_exec/199464:0x7fb0d4a1d740 [vertica_process_is_up.run_in_subprocess] run_in_subprocess: ['ps', '-C', 'vertica', '-o', 'args']None
    2017-02-03 18:14:39.656 at_exec/199469:0x7f4fe17da740 [root.setup_custom_logging] New log for 'at_exec'
    2017-02-03 18:14:39.657 at_exec/199469:0x7f4fe17da740 [root.setup_custom_logging] sys.argv: '/opt/vertica/share/eggs/vertica/engine/api/at_runner.py' '--module=vertica.engine.api.start_database'
    2017-02-03 18:14:39.657 at_exec/199469:0x7f4fe17da740 [ATRunner._parse_command] Reading a line from stdin...
    2017-02-03 18:14:39.759 at_exec/199469:0x7f4fe17da740 [ATRunner._parse_command] Reading complete
    2017-02-03 18:14:39.761 at_exec/199469:0x7f4fe17da740 [ATRunner.exec_module] ATRunner exec_module: command: module=None version=1.0 args={u'node': {u'name': u'v_test_db_node0002', u'storagelocs': [u'/opt/vertica/data/test_db/v_test_db_node0002_data'], u'oid': u'45035996273721876', u'host': u'10.232.132.9', u'catalogpath': u'/opt/vertica/data/test_db/v_test_db_node0002_catalog', u'controlnode': u'45035996273721876', u'startcmd': u'"/opt/vertica/bin/vertica" "-D" "/opt/vertica/data/test_db/v_test_db_node0002_catalog" "-C" "test_db" "-n" "v_test_db_node0002" "-h" "10.232.132.9" "-p" "5433" "-P" "4803" "-Y" "ipv4"', u'port': u'5433'}, u'last_epoch': None, u'unsafe_startup': False, u'special_environment': None, u'delete_corrupted_data': False}
    2017-02-03 18:14:39.761 at_exec/199469:0x7f4fe17da740 [start_database.run_in_subprocess] run_in_subprocess: [u'/opt/vertica/bin/vertica', u'-D', u'/opt/vertica/data/test_db/v_test_db_node0002_catalog', u'-C', u'test_db', u'-n', u'v_test_db_node0002', u'-h', u'10.232.132.9', u'-p', u'5433', u'-P', u'4803', u'-Y', u'ipv4']None

    Startup.log I'm not sure how to interpret
    [root@ip-10-232-132-9 v_test_db_node0002_catalog]# cat startup.log
    {
    "node" : "v_test_db_node0002",
    "stage" : "Connecting to Spread",
    "text" : "Connecting to spread /opt/vertica/spread/tmp/4803",
    "timestamp" : "2017-02-03 18:14:39.908"
    }
    {
    "node" : "v_test_db_node0002",
    "stage" : "Waiting for Cluster Invite",
    "text" : "Prepare to be invited",
    "timestamp" : "2017-02-03 18:14:40.001"
    }
    {
    "node" : "v_test_db_node0002",
    "stage" : "Waiting for Cluster Invite",
    "text" : "Prepare to be invited",
    "timestamp" : "2017-02-03 18:14:42.000"
    }
    {
    "node" : "v_test_db_node0002",
    "stage" : "Waiting for Cluster Invite",
    "text" : "Prepare to be invited",
    "timestamp" : "2017-02-03 18:14:44.000"
    }
    {
    "node" : "v_test_db_node0002",
    "stage" : "Waiting for Cluster Invite",
    "text" : "Prepare to be invited",
    "timestamp" : "2017-02-03 18:14:46.000"
    }
    {
    "node" : "v_test_db_node0002",
    "stage" : "Waiting for Cluster Invite",
    "text" : "Prepare to be invited",
    "timestamp" : "2017-02-03 18:14:48.000"
    }
    {
    "node" : "v_test_db_node0002",
    "stage" : "Waiting for Cluster Invite",
    "text" : "Ready to be invited",
    "timestamp" : "2017-02-03 18:14:48.909"
    }

    and Vertica.log shows this:

    2017-02-03 18:23:34.000 DiskSpaceRefresher:7f16b17ca700 [Catalog] getLocalStorageLocations: no local node
    2017-02-03 18:23:34.000 DiskSpaceRefresher:7f16b17ca700 [Util] Task 'DiskSpaceRefresher' enabled
    2017-02-03 18:23:34.001 nameless:7f16c37fe700 [Catalog] getLocalStorageLocations: no local node
    2017-02-03 18:23:35.001 nameless:7f16c37fe700 [Catalog] getLocalStorageLocations: no local node
    2017-02-03 18:23:36.001 nameless:7f16c37fe700 [Catalog] getLocalStorageLocations: no local node
    2017-02-03 18:23:37.001 nameless:7f16c37fe700 [Catalog] getLocalStorageLocations: no local node
    [root@ip-10-232-132-5 v_test_db_node0003_catalog]#

  • Options

    I'm in AWS and going to try the install again using the --point-to-point flag

  • Options

    Found the issue. Security groups were messed up. Only had TCP for DNS and was missing UDP for spread.

    This prior thread pointed me in the right direction. https://forum.vertica.com/discussion/comment/237056/#Comment_237056

    and the official documentation was helpful as well. https://my.vertica.com/docs/Ecosystem/Amazon/UsingVerticaonAWSHTML/Default.htm#Authoring/UsingVerticaOnAWS/CreatingASecurityGroup.htm?TocPath=Configure%20Your%20Network|_____7

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file