Vertica 6.1.2 in AWS Cluster getting error when creating database in cluster - getLocalNodeInternal
I am doing a POC for a client who is running Vertica 6.1.2 in AWS
So I have 2 node cluster and I was able to install the software with cluster option with no problems..
When I started to create the database, the will get created on the first node but the 2nd node keep showing down and in the catalog directory under vertica.log I see following message which keeps repeating...
nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name2014-09-10 16:40:48.001 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:49.002 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:49.004 Cluster Inviter:0x7fefe800f310 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:49.004 Cluster Inviter:0x7fefe800f310 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:50.001 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:51.002 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:51.006 Cluster Inviter:0x7fefe800e700 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:51.006 Cluster Inviter:0x7fefe800e700 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:52.002 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:53.002 Cluster Inviter:0x7fefe800e700 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:53.002 Cluster Inviter:0x7fefe800e700 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:53.003 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:54.000 Timer Service:0x6422050 [Util] <INFO> Task 'AnalyzeRowCount' enabled
2014-09-10 16:40:54.000 Timer Service:0x6422050 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:54.000 Timer Service:0x6422050 [Util] <INFO> Task 'LicenseSizeAuditor' enabled
2014-09-10 16:40:54.001 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
Anyone has encountered this or have any clue...
Firewall is off...SELINUX is disabled...
ssh between nodes is all OK.
So I have 2 node cluster and I was able to install the software with cluster option with no problems..
When I started to create the database, the will get created on the first node but the 2nd node keep showing down and in the catalog directory under vertica.log I see following message which keeps repeating...
nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name2014-09-10 16:40:48.001 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:49.002 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:49.004 Cluster Inviter:0x7fefe800f310 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:49.004 Cluster Inviter:0x7fefe800f310 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:50.001 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:51.002 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:51.006 Cluster Inviter:0x7fefe800e700 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:51.006 Cluster Inviter:0x7fefe800e700 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:52.002 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:53.002 Cluster Inviter:0x7fefe800e700 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:53.002 Cluster Inviter:0x7fefe800e700 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:53.003 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:54.000 Timer Service:0x6422050 [Util] <INFO> Task 'AnalyzeRowCount' enabled
2014-09-10 16:40:54.000 Timer Service:0x6422050 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
2014-09-10 16:40:54.000 Timer Service:0x6422050 [Util] <INFO> Task 'LicenseSizeAuditor' enabled
2014-09-10 16:40:54.001 nameless:0x641aeb0 [Catalog] <WARNING> Catalog::getLocalNodeInternal empty node name
Anyone has encountered this or have any clue...
Firewall is off...SELINUX is disabled...
ssh between nodes is all OK.
0
Comments
Vertica
5433 TCP (All connections)
Spread
4803 TCP (Client connections)
4803 UDP (Daemon <-> Daemon)
4804 UDP (Daemon <-> Daemon)
I checked the ports and they are available on both nodes..
Node1
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN tcp 0 0 172.31.27.164:5433 172.31.27.164:57390 ESTABLISHED
tcp 0 0 172.31.27.164:57390 172.31.27.164:5433 ESTABLISHED
tcp 0 0 :::5433 :::* LISTEN
udp 0 0 0.0.0.0:5433 0.0.0.0:*
testclust [root@vertn1 ~]# netstat -an | grep 480
tcp 0 0 0.0.0.0:54804 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:4803 0.0.0.0:* LISTEN
tcp 0 0 172.31.27.164:4803 0.0.0.0:* LISTEN
udp 0 0 127.0.0.1:4803 0.0.0.0:*
udp 0 0 172.31.27.164:4803 0.0.0.0:*
udp 0 0 172.31.31.255:4803 0.0.0.0:*
udp 0 0 127.0.0.1:4804 0.0.0.0:*
udp 0 0 172.31.27.164:4804 0.0.0.0:*
unix 2 [ ACC ] STREAM LISTENING 32701 /tmp/4803
unix 3 [ ] STREAM CONNECTED 35357 /tmp/4803
testclust [root@vertn1 ~]#
Node 2
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN tcp 0 0 172.31.27.164:5433 172.31.27.164:57390 ESTABLISHED
tcp 0 0 172.31.27.164:57390 172.31.27.164:5433 ESTABLISHED
tcp 0 0 :::5433 :::* LISTEN
udp 0 0 0.0.0.0:5433 0.0.0.0:*
testclust [root@vertn1 ~]# netstat -an | grep 480
tcp 0 0 0.0.0.0:54804 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:4803 0.0.0.0:* LISTEN
tcp 0 0 172.31.27.164:4803 0.0.0.0:* LISTEN
udp 0 0 127.0.0.1:4803 0.0.0.0:*
udp 0 0 172.31.27.164:4803 0.0.0.0:*
udp 0 0 172.31.31.255:4803 0.0.0.0:*
udp 0 0 127.0.0.1:4804 0.0.0.0:*
udp 0 0 172.31.27.164:4804 0.0.0.0:*
unix 2 [ ACC ] STREAM LISTENING 32701 /tmp/4803
unix 3 [ ] STREAM CONNECTED 35357 /tmp/4803
Anything else could be the issue...
Also I have installed with node name and then also with ip address...same issue.
This error normally is an spread issues. This are step to debug spread. You did step 3, so please check the other steps.. the 4 will tell you if the spread is correctly working :
1- If you are in amazon EC2 you have to be sure that when install vertica you had used -N -T option.
2- Check the /opt/vertica/config/vspread.conf files on all nodes – they need to be identical.
If it was installed with option -N -T all the nodes should have their own segment like this :
Spread_Segment 10.104.33.179:4803 {
N010104033179 10.104.33.179 {
10.104.33.179
127.0.0.1
}
}
Spread_Segment 10.104.87.247:4803 {
N010104087247 10.104.87.247 {
10.104.87.247
127.0.0.1
}
}
It will be incorrect to have the vspread.conf file with just one segment like this :
Spread_Segment 10.104.33.179:4803 {
N010010001023 10.10.1.23 {
10.104.33.179
127.0.0.1
}
N010010001024 10.104.87.247 {
10.104.87.247
127.0.0.1
}
}
3- If the vspread configuration is fine.Verify that the nodes ports are correctly open. Spread need port 4803 and 4804
You can verify doing a nststat test, for example
# sudo netstat -uatp | grep 480
tcp 0 0 localhost.localdomain:4803 *:* LISTEN 2374/spread
tcp 0 0 priv-mymachine1.vertica:4803 *:* LISTEN 2374/spread
tcp 0 0 localhost.localdomain:smtp *:* LISTEN 4800/sendmail: acce
udp 0 0 localhost.localdomain:4803 *:* 2374/spread
udp 0 0 priv-mymachine.vertica:4803 *:* 2374/spread
udp 0 0 192.168.xxx.xxx:4803 *:* 2374/spread
udp 0 0 localhost.localdomain:4804 *:* 2374/spread
udp 0 0 priv-mymachine1.verti:4804 *:* 2374/spread
4- Do spread test with spuser to see if nodes at least are communicating between them.
You can do spuser -r test like this
/opt/vertica/spread/bin/spuser -r
At the prompt User >
enter in “j test”
j is for join and you are creating a grouping called test.
Repeat this on all nodes to confirm that they are connecting to each other. You should see the new connections come in with each node connecting.
You were right it was the spread issue. I uninstalled vertica and reinstalled it with -N and -T option and then tested spread. I did not knew about -N and -T has to be used in AWS.
Is -N and -T also required for all later version 7.x and 7.1.x.
Thank you again, I really appreciate your help.
Glad that you resolve your issue.
Eugenia