Error while creating/starting database in a 3-node Vertica cluster.

Hi, I am using Vertica community edition 7.2.3 and am setting up a 3-node cluster on AWS VPC. My install_vertica script runs fine. While creating the database, the node from where i am issuing the command via adminTools comes up but rest of the two nodes stay down. Node Status: v_mpp_test_node0001: (DOWN) v_mpp_test_node0002: (UP) v_mpp_test_node0003: (DOWN) All the requisite ports as per 'my.vertica.com/docs/Ecosystem/Amazon/HP_Vertica_7.1.x_Vertica_AWS.pdf' are open. There exists an password less ssh between the 3-servers. following is the netstat output: [root@ip-10-0-3-xxx ec2-user]# netstat -an | egrep 'tcp|udp' tcp 0 0 10.0.2.185:4803 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:5444 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:36582 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:5434 0.0.0.0:* LISTEN tcp 0 0 10.0.2.185:22 10.0.200.5:53124 ESTABLISHED tcp 0 0 10.0.2.185:5434 10.0.2.185:36504 ESTABLISHED tcp 0 0 10.0.2.185:36504 10.0.2.185:5434 ESTABLISHED tcp 0 0 :::111 :::* LISTEN tcp 0 0 :::22 :::* LISTEN tcp 0 0 ::1:631 :::* LISTEN tcp 0 0 :::5433 :::* LISTEN tcp 0 0 ::1:25 :::* LISTEN udp 0 0 0.0.0.0:111 0.0.0.0:* udp 0 0 0.0.0.0:631 0.0.0.0:* udp 0 0 10.0.2.185:123 0.0.0.0:* udp 0 0 127.0.0.1:123 0.0.0.0:* udp 0 0 0.0.0.0:123 0.0.0.0:* udp 0 0 0.0.0.0:33666 0.0.0.0:* udp 0 0 0.0.0.0:935 0.0.0.0:* udp 0 0 10.0.2.185:5433 0.0.0.0:* udp 0 0 10.0.2.185:4803 0.0.0.0:* udp 0 0 10.0.2.185:4804 0.0.0.0:* udp 0 0 0.0.0.0:68 0.0.0.0:* udp 0 0 0.0.0.0:42327 0.0.0.0:* udp 0 0 :::111 :::* udp 0 0 fe80::8bf:7fff:fe9f:bd1b:123 :::* udp 0 0 ::1:123 :::* udp 0 0 :::123 :::* udp 0 0 :::935 :::* [root@ip-10-0-2-185 ec2-user]# nc -vz -u 10.0.2.186 4803 Connection to 10.0.2.186 4803 port [udp/notateit-disc] succeeded! [root@ip-10-0-2-185 ec2-user]# nc -vz -u 10.0.2.184 4803 Connection to 10.0.2.184 4803 port [udp/notateit-disc] succeeded! But i am getting following error while running vnetpref [dbadmin@ip-10-0-2-184 ~]$/opt/vertica/bin/vnetperf 2016-08-02_13:10:28,619 ERROR: [Connector Thread 10.0.2.186 0x7fe59578e700] Couldn't connect to 10.0.2.186 (family 2, attempt 0): Connection timed out; errno=110 (Connection timed out) 2016-08-02_13:10:28,619 ERROR: [Connector Thread 10.0.2.185 0x7fe59618f700] Couldn't connect to 10.0.2.185 (family 2, attempt 0): Connection timed out; errno=110 (Connection timed out) 2016-08-02_13:11:32,619 ERROR: [Connector Thread 10.0.2.186 0x7fe59578e700] Couldn't connect to 10.0.2.186 (family 2, attempt 1): Connection timed out; errno=110 (Connection timed out) 2016-08-02_13:11:32,619 ERROR: [Connector Thread 10.0.2.185 0x7fe59618f700] Couldn't connect to 10.0.2.185 (family 2, attempt 1): Connection timed out; errno=110 (Connection timed out) 2016-08-02_13:12:36,619 ERROR: [Connector Thread 10.0.2.186 0x7fe59578e700] Couldn't connect to 10.0.2.186 (family 2, attempt 2): Connection timed out; errno=110 (Connection timed out) 2016-08-02_13:12:36,619 ERROR: [Connector Thread 10.0.2.185 0x7fe59618f700] Couldn't connect to 10.0.2.185 (family 2, attempt 2): Connection timed out; errno=110 (Connection timed out) 2016-08-02_13:12:37,619 ERROR: [Connector Thread 10.0.2.186 0x7fe59578e700] Could not find anything to connect to for 10.0.2.186; errno=110 (Connection timed out) 2016-08-02_13:12:37,619 ERROR: [Connector Thread 10.0.2.185 0x7fe59618f700] Could not find anything to connect to for 10.0.2.185; errno=110 (Connection timed out) 2016-08-02_13:12:37,619 ERROR: [main 0x7fe596b92720] Caught error: Unable to connect to host 10.0.2.185:14159 Unable to connect to host 10.0.2.186:14159; errno=0 (Success) Do i also have to open port 14159 specifically? What could be the issue? Please do let me know if any specific log is required.

Comments

  • It's a little difficult to read what you posted but this looks like a connectivity issue. Can you please post a screenshot of your security group used to setup these instances? Second, can you post your admintools.conf and spread.conf. /opt/vertica/config/admintools.conf and /path/to/catalog/v_asfasd_catalog/spread.conf

     

    Thanks! 

  • Hi, this looks like it could be the the UDP limitation in AWS. Did you configure the cluster using the point-to-point flag? The Vertica spread daemon uses UDP to communicate, but in many cloud/virtual environments, UDP traffic is either blocked or doesn't behave normally. In these situations, the use of point to point communication must be enabled to allow the nodes to communicate.

    -Chris
  • Sorry for the initial unformatted question. Following is the install command i run, and it runs fine without any issue. /opt/vertica/sbin/install_vertica -T -s 10.0.2.184,10.0.2.185,10.0.2.186 -d /mpp01 -r /root/vertica-7.2.3-0.x86_64.RHEL6.rpm -i mpp.pem --dba-user-password-disabled --failure-threshold HINT -L /opt/vertica/config/licensing/ -Y When i try to create database, i get following message: Node Status: v_mpp_test_node0001: (UP) v_mpp_test_node0002: (DOWN) v_mpp_test_node0003: (DOWN) #unable to attach file so pasting the admintools.conf here: [dbadmin@ip-10-0-2-184 ~]$ cat /opt/vertica/config/admintools.conf [Configuration] last_port = 5433 tmp_dir = /tmp default_base = /home/dbadmin format = 3 install_opts = -T -s '10.0.2.184,10.0.2.185,10.0.2.186' -d /data01 -r '/root/vertica-7.2.3-0.x86_64.RHEL6.rpm' -i 'mpp.pem' --dba-user-password-disabled --failure-threshold HINT -L /opt/vertica/config/licensing/ -Y -w --control-network '10.0.3.255' at_debug = False spreadlog = True controlsubnet = 10.0.2.255 ipv6 = False controlmode = pt2pt unreachable_host_caching = True admintools_config_version = 103 [Cluster] hosts = 10.0.2.184,10.0.2.185,10.0.2.186 [Nodes] v_mpp_test_node0001 = 10.0.2.184,/data01,/data01 v_mpp_test_node0002 = 10.0.2.185,/data01,/data01 v_mpp_test_node0003 = 10.0.2.186,/data01,/data01 [SSHConfig] ssh_user = ssh_ident = ssh_options = -oConnectTimeout=30 -o TCPKeepAlive=no -o ServerAliveInterval=15 -o ServerAliveCountMax=2 -o StrictHostKeyChecking=no -o BatchMode=yes [Database:mpp_TEST] restartpolicy = ksafe port = 5433 path = /data01/mpp_TEST nodes = v_mpp_test_node0001,v_mpp_test_node0002,v_mpp_test_node0003 #------------------------------------------------------------------------------- [dbadmin@ip-10-0-2-184 v_adp_test_node0001_catalog]$ cat spread.conf # 6 # Auto-generated by vertica - do not edit ActiveIPVersion = IPv4 Spread_Segment 10.0.2.184:4803 { N010000002184 10.0.2.184 { 10.0.2.184 } } Spread_Segment 10.0.2.185:4803 { N010000002185 10.0.2.185 { 10.0.2.185 } } Spread_Segment 10.0.2.186:4803 { N010000002186 10.0.2.186 { 10.0.2.186 } } # begin end matter EventLogFile = /mpp01/MPP_TEST/spread.log EventTimeStamp = "[%a %d %b %Y %H:%M:%S]" DebugFlags = { CONFIGURATION MEMBERSHIP PRINT EXIT SESSION GROUPS } ExitOnIdle = yes #----------------------------------------------------------------------------------- [dbadmin@ip-10-0-2-184 v_mpp_test_node0001_catalog]$ ss -auntp | grep 480 udp UNCONN 0 0 10.0.2.184:4803 *:* users:(("spread",200555,3)) udp UNCONN 0 0 10.0.2.184:4804 *:* users:(("spread",200555,4)) tcp LISTEN 0 25 10.0.2.184:4803 *:* users:(("spread",200555,6)) [dbadmin@ip-10-0-2-184 v_mpp_test_node0001_catalog]$ ss -auntp | grep 54 udp UNCONN 0 0 10.0.2.184:5433 *:* users:(("vertica",200557,20)) tcp LISTEN 0 5 *:5444 *:* users:(("python",200320,18)) tcp LISTEN 0 128 :::5433 :::* users:(("vertica",200557,12)) tcp LISTEN 0 128 *:5433 *:* users:(("vertica",200557,11)) tcp LISTEN 0 128 *:5434 *:* users:(("vertica",200557,3)) tcp ESTAB 0 0 10.0.2.184:55927 10.0.2.184:5434 users:(("vertica",200557,21)) tcp ESTAB 0 0 10.0.2.184:5434 10.0.2.184:55927 users:(("vertica",200557,23)) #--------------------------------------------------------------------------------------------------- But for the node which didnt become up the tcp 5433 wasnt showing [dbadmin@ip-10-0-2-185 ~]$ ss -auntp | grep 54 udp UNCONN 0 0 10.0.3.185:5433 *:* users:(("vertica",174959,19)) tcp LISTEN 0 5 *:5444 *:* users:(("python",174784,18)) tcp LISTEN 0 128 *:5434 *:* users:(("vertica",174959,3)) [dbadmin@ip-10-0-2-185 ~]$ ss -auntp | grep 480 udp UNCONN 0 0 10.0.2.185:4803 *:* users:(("spread",174957,3)) udp UNCONN 0 0 10.0.2.185:4804 *:* users:(("spread",174957,4)) tcp LISTEN 0 25 10.0.2.185:4803 *:* users:(("spread",174957,6)) but if i do a fresh install and start database from node2 then the ports over here are up and for other nodes are not #-------------------------------------------------- security group info, this group was used for the servers i am using Custom TCP Rule TCP 4803 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) Custom TCP Rule TCP 5450 sg-7967ec02 (adpvpc-sgPublicELB-19BL6ROUAMZIP) SSH TCP 22 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) SSH TCP 22 sg-7867ec03 (adpvpc-sgManagement-3KBNAHZNU4IK) SSH TCP 22 122.98.140.5/32 SSH TCP 22 10.0.2.210/32 SSH TCP 22 10.0.5.211/32 SSH TCP 22 23.20.185.205/32 SSH TCP 22 52.204.122.218/32 Custom TCP Rule TCP 14157 - 14161 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) Custom TCP Rule TCP 5434 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) Custom TCP Rule TCP 5434 sg-7867ec03 (adpvpc-sgManagement-3KBNAHZNU4IK) Custom TCP Rule TCP 5434 10.0.5.211/32 Custom TCP Rule TCP 5434 23.20.185.205/32 Custom TCP Rule TCP 27017 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) Custom TCP Rule TCP 27017 sg-7867ec03 (adpvpc-sgManagement-3KBNAHZNU4IK) Custom TCP Rule TCP 5666 sg-7867ec03 (adpvpc-sgManagement-3KBNAHZNU4IK) MYSQL/Aurora TCP 3306 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) MYSQL/Aurora TCP 3306 sg-7867ec03 (adpvpc-sgManagement-3KBNAHZNU4IK) Custom TCP Rule TCP 28017 sg-7867ec03 (adpvpc-sgManagement-3KBNAHZNU4IK) Custom UDP Rule UDP 6543 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) Custom TCP Rule TCP 4804 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) Custom ICMP Rule Echo Request N/A sg-7867ec03 (adpvpc-sgManagement-3KBNAHZNU4IK) Custom TCP Rule TCP 5433 sg-3a67ec41 (adpvpc-sgDatabase-R1CSBQW5TGFQ) Custom TCP Rule TCP 5433 sg-7867ec03 (adpvpc-sgManagement-3KBNAHZNU4IK) Custom TCP Rule TCP 5433 10.0.5.211/32 #-------------------------------------------------------------------------------------------------- And also i am able to access the database from the node that is down [dbadmin@ip-10-0-2-185 ~]$ admintools -t list_allnodes Node | Host | State | Version | DB ---------------------+------------+-------+-----------------+---------- v_mpp_test_node0001 | 10.0.2.184 | UP | vertica-7.2.3.0 | MPP_TEST v_mpp_test_node0002 | 10.0.2.185 | DOWN | vertica-7.2.3.0 | MPP_TEST v_mpp_test_node0003 | 10.0.2.186 | DOWN | vertica-7.2.3.0 | MPP_TEST #------ MPP_TEST=> \! hostname ip-10-0-2-185 MPP_TEST=> select * from schemata; schema_id | schema_name | schema_owner_id | schema_owner | system_schema_creator | create_time | is_system_schema -------------------+-------------+-------------------+--------------+-----------------------+-------------------------------+------------------ 8300 | v_internal | 45035996273704962 | dbadmin | dbadmin | 2016-08-03 06:09:59.829151+00 | t 8301 | v_catalog | 45035996273704962 | dbadmin | dbadmin | 2016-08-03 06:09:59.829201+00 | t 8302 | v_monitor | 45035996273704962 | dbadmin | dbadmin | 2016-08-03 06:09:59.829216+00 | t 45035996273704978 | public | 45035996273704962 | dbadmin | | 2016-08-03 06:09:58.53816+00 | f 45035996273722426 | v_idol | 45035996273704962 | dbadmin | | 2016-08-03 06:22:08.438045+00 | f 45035996273722496 | v_txtindex | 45035996273704962 | dbadmin | | 2016-08-03 06:22:08.802094+00 | f #--------- I even killed the vertica process on one of the nodes that is down, still i was able to connect to the database from that node using admintools. is this some setup issue with AWS. What am i missing?
  • Hi Nithesh,

       Thanks for the update, and yes it looks like you are not using the --point-to-point option which is a requirement in that environment.  While the installation went fine without it (the installer does not detect when it is required) you will experience the exact problem you are describing.  The install command should be this:

     

    /opt/vertica/sbin/install_vertica -T -s 10.0.2.184,10.0.2.185,10.0.2.186 --point-to-point -d /mpp01 -r /root/vertica-7.2.3-0.x86_64.RHEL6.rpm -i mpp.pem --dba-user-password-disabled --failure-threshold HINT -L /opt/vertica/config/licensing/ -Y

     

    This will force the spread to talk over TCP instead of UDP, and all will be good.

     

    -Chris

     

  • Hi Chris, Isn't the '-T' option equivalent to '--point-to-point'.
  • Nice catch, I didn't notice the -T.  Yes, according to the documentation they are the same option, although I always tend to use the -- flags.  

     

    That being said, lets talk about your network then.  How do you have it configured?  Can you ping between nodes?  Is there a firewall running on the vms?

  • #i am not sure why i dont have options to format these posts.#--------------------------------------------------------------# Followed the security rules : https://my.vertica.com/docs/7.0.x/HTML/Content/Authoring/UsingVerticaOnAWS/AddingRulesToASecurityGroup.htm Can't ping between the nodes-noticed now, thanks. #---------------------------------------------------------------------------------------------# no firewall running in vms
  • You do have an option. There is a toolbar. Use the "Insert code" function.

     

    Also, check the version 7.2 doc.  AWS for 7.2 

  • It was an AWS VPC issue, in the security group the rule to enable instances in a VPC to communicate with each other wasn't added. After allowing all inbound traffic between nodes in VPC, the installation worked fine. Thank you Chris for your question on pinging nodes, it was a very basic thing, but since i had added the rules as per Vertica documentation i had overlooked that aspect earlier.
  • Nithesh,

         Glad to help, and happy it was something simple.  

     

    -Chris

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file