Migrating single node ( local host bind) IP address node to 3 nodes Vertica cluster
Hi Team,
I have been using a single node (127.0.0.1) vertica cluster. Currently , I am planning to move the DB from this node to a new 3 node cluster. In fact, I already changed the IP to a new non-local IP using "admintools -t re_ip -f mapfile.txt". I changes the IP just to avoid any type of backup/restore issue , if any.
AFAIK, we can't restore the full back from the single node cluster to 3 node cluster as per the design. So, I decided to move to a brand new single node cluster ( bind to x.x.x.x having same DB name ). I restored the full backup from the source one. It worked fine.
Now I wanted to add 2 new nodes to the above single node cluster ( IP = x.x.x.x). I used below command to add 2 hosts to the cluster ( x.x.x.x node1)
/opt/vertica/sbin/update_vertica --add-host y.y.y.y, z.z.z.z --dba-user verticauser --license ~/123.dat --failure-threshold FAIL --rpm ~/vertica-9.2.1-0.x86_64.RHEL6.rpm
it works fine. I a not sure, what is the best way to verify that the host additions to cluster worked fine. The result did't through any error.
Then, I tried to add the database to the new hosts.
admintools -t db_add_node -d testdb -p novell -s y.y.y.y , z.z.z.z
Looks like it works fine as well. I believe this step must create the database directory in same location. I am not able to see all the files under catalog (e.g. /v_testdb_node0002_catalog/ ). Only spread.conf and vertica.conf files exist.
Issue:
Database starts on both the nodes throws error.
*** Starting database: testdb ***
Starting nodes:
v_testdb_node0001 (x.x.x.x)
v_testdb_node0002 (y.y.y.y)
Error: the vertica process for the database is running on the following hosts:
y.y.y.y
This may be because the process has not completed previous shutdown activities. Please wait and retry again.
Database start up failed. Processes still running.
Found these errors in startup.logs on hosts:
.../v_testdb_node0002_catalog/startup.log: exception Can't open .../v_testdb_node0002_catalog/startup.logfor reading: [Errno 2] No such file or directory:
I may be doing something completely wrong. The passwordless ssh was also set correctly. This looks like the required files for database (catalog) are not getting replicated to other 2 nodes.
I will appreciate for any help to resolve this issue.
Regards,
SM
Best Answers
-
msanjib ✭
startup.log
{
"goal" : 193483140,
"node" : "v_appdefenderdb_node0001",
"progress" : 171015310,
"stage" : "Read DataCollector",
"text" : "Inventory files (bytes)",
"timestamp" : "2020-01-22 10:59:34.843"
}
{
"goal" : 193483140,
"node" : "v_appdefenderdb_node0001",
"progress" : 191441417,
"stage" : "Read DataCollector",
"text" : "Inventory files (bytes)",
"timestamp" : "2020-01-22 10:59:34.912"
}
{
"node" : "v_appdefenderdb_node0001",
"stage" : "Check Storage",
"text" : "Removing unnecessary storage files",
"timestamp" : "2020-01-22 10:59:35.006"
}
{
"goal" : 24,
"node" : "v_appdefenderdb_node0001",
"progress" : 0,
"stage" : "Check Storage",
"text" : "Confirming storage matches catalog (files)",
"timestamp" : "2020-01-22 10:59:36.249"
}
{
"goal" : 24,
"node" : "v_appdefenderdb_node0001",
"progress" : 11,
"stage" : "Check Storage",
:0 -
Abhishek_Rana Employee
Your understanding on -T --point-to-point is wrong.
SPREAD always use UDP protocol for it's messages ,but weather those messages will be broadcasted over network switch or will have specific node addresses for delivery is decided by this.
In point-to-point configuration, the topology is the same as Broadcast. However, a node can send a package with a header to each node (IPv4 unicast protocol), and the switch can then reroute that package to the corresponding node. If the switch only has Vertica nodes, this option could present more traffic. However, if the switch has other applications or nodes, using this option reduces the chance that the package is lost.
5
Answers
Is the single node listed as localhost or 127.0.0.1 in /opt/vertica/config/admintools.conf? If so, you will not be able to add nodes to the cluster. Please review the "Important" comment at https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/ManageNodes/AddingNodes.htm?zoom_highlight="single node"
To create a three-node cluster, I recommend the following procedure:
Thanks.
In fact, I followed the exact same steps.
my cluster is not starting. Is there any way to recover from this step?
Regards,
SM
admintools -t start_db -d appdefenderdb -p
Starting nodes:
v_appdefenderdb_node0001 (x.x.x.x)
v_appdefenderdb_node0002 (y.y.y.y)
v_appdefenderdb_node0003 (z.z.z.z)
Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
Checking database state
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
It is suggested that you continue waiting.
Nodes in TRANSITIONAL state: ,,
Nodes DOWN: v_appdefenderdb_node0001, v_appdefenderdb_node0002, v_appdefenderdb_node0003 (may be still initializing).
It is suggested that you continue waiting.
Do you want to continue waiting? (yes/no) [yes]
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
please check startup.log to find where it is failing.
Please find the logs from the vertica.log file. I don't find any clue. May be I have enable debug level flag to get the complete information. The below log just repeating. No luck, let me check all the network pre-requisite one more time.
/appdefender/db_data/verticauser/appdefenderdb/v_appdefenderdb_node0001_catalog/vertica.log
2020-01-22 10:47:55.002 MetadataPoolMonitor:0x7f734bfff700 @v_appdefenderdb_node0001: 00000/7794: Updated metadata pool: Memory(KB): 22435
2020-01-22 10:47:57.010 Cluster Inviter:0x7f734bfff700 [Comms] My global sequence value is 131700
2020-01-22 10:47:59.001 Cluster Inviter:0x7f734bfff700 [Comms] My global sequence value is 131700
2020-01-22 10:48:01.001 Cluster Inviter:0x7f7359b0b700 [Comms] My global sequence value is 131700
2020-01-22 10:48:02.000 Timer Service:0x7f735b30e700 [Util] Task 'FeatureUseLogger' enabled
2020-01-22 10:48:02.000 Timer Service:0x7f735b30e700 [Util] Task 'LicenseSizeAuditor' enabled
Thanks
SM
Finally, it worked.
I believe the the cluster use default UDP broadcast traffic for cluster communication. Either I should have opted for " -T --point-to-point" while adding the hosts while calling install_vertica or update_vertica script.
This way I will enforce the cluster node(spread) to communicate using TCP instead of UDP. I believe this is the best option as it will not flood the network traffic.
I just disable to firewall for testing purpose and it worked.
Thanks for helping me troubleshooting.
Good to know it worked. Please review the below link and make sure everything is set as per the recommendation to avoid performance hit and other issues.
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/InstallationGuide/BeforeYouInstall/OsConfigTaskOverview.htm?