Migrating single node ( local host bind) IP address node to 3 nodes Vertica cluster

msanjib · January 2020

Hi Team,
I have been using a single node (127.0.0.1) vertica cluster. Currently , I am planning to move the DB from this node to a new 3 node cluster. In fact, I already changed the IP to a new non-local IP using "admintools -t re_ip -f mapfile.txt". I changes the IP just to avoid any type of backup/restore issue , if any.

AFAIK, we can't restore the full back from the single node cluster to 3 node cluster as per the design. So, I decided to move to a brand new single node cluster ( bind to x.x.x.x having same DB name ). I restored the full backup from the source one. It worked fine.

Now I wanted to add 2 new nodes to the above single node cluster ( IP = x.x.x.x). I used below command to add 2 hosts to the cluster ( x.x.x.x node1)
/opt/vertica/sbin/update_vertica --add-host y.y.y.y, z.z.z.z --dba-user verticauser --license ~/123.dat --failure-threshold FAIL --rpm ~/vertica-9.2.1-0.x86_64.RHEL6.rpm
it works fine. I a not sure, what is the best way to verify that the host additions to cluster worked fine. The result did't through any error.

Then, I tried to add the database to the new hosts.
admintools -t db_add_node -d testdb -p novell -s y.y.y.y , z.z.z.z
Looks like it works fine as well. I believe this step must create the database directory in same location. I am not able to see all the files under catalog (e.g. /v_testdb_node0002_catalog/ ). Only spread.conf and vertica.conf files exist.

Issue:
Database starts on both the nodes throws error.

*** Starting database: testdb ***
Starting nodes:
v_testdb_node0001 (x.x.x.x)
v_testdb_node0002 (y.y.y.y)
Error: the vertica process for the database is running on the following hosts:
y.y.y.y
This may be because the process has not completed previous shutdown activities. Please wait and retry again.
Database start up failed. Processes still running.
Found these errors in startup.logs on hosts:
.../v_testdb_node0002_catalog/startup.log: exception Can't open .../v_testdb_node0002_catalog/startup.logfor reading: [Errno 2] No such file or directory:

I may be doing something completely wrong. The passwordless ssh was also set correctly. This looks like the required files for database (catalog) are not getting replicated to other 2 nodes.
I will appreciate for any help to resolve this issue.

Regards,
SM

msanjib · January 2020

startup.log

{
"goal" : 193483140,
"node" : "v_appdefenderdb_node0001",
"progress" : 171015310,
"stage" : "Read DataCollector",
"text" : "Inventory files (bytes)",
"timestamp" : "2020-01-22 10:59:34.843"
}
{
"goal" : 193483140,
"node" : "v_appdefenderdb_node0001",
"progress" : 191441417,
"stage" : "Read DataCollector",
"text" : "Inventory files (bytes)",
"timestamp" : "2020-01-22 10:59:34.912"
}
{
"node" : "v_appdefenderdb_node0001",
"stage" : "Check Storage",
"text" : "Removing unnecessary storage files",
"timestamp" : "2020-01-22 10:59:35.006"
}
{
"goal" : 24,
"node" : "v_appdefenderdb_node0001",
"progress" : 0,
"stage" : "Check Storage",
"text" : "Confirming storage matches catalog (files)",
"timestamp" : "2020-01-22 10:59:36.249"
}
{
"goal" : 24,
"node" : "v_appdefenderdb_node0001",
"progress" : 11,
"stage" : "Check Storage",
:

Abhishek_Rana · January 2020

Your understanding on -T --point-to-point is wrong.

SPREAD always use UDP protocol for it's messages ,but weather those messages will be broadcasted over network switch or will have specific node addresses for delivery is decided by this.

In point-to-point configuration, the topology is the same as Broadcast. However, a node can send a package with a header to each node (IPv4 unicast protocol), and the switch can then reroute that package to the corresponding node. If the switch only has Vertica nodes, this option could present more traffic. However, if the switch has other applications or nodes, using this option reduces the chance that the package is lost.

Bryan_H · January 2020

Is the single node listed as localhost or 127.0.0.1 in /opt/vertica/config/admintools.conf? If so, you will not be able to add nodes to the cluster. Please review the "Important" comment at https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/ManageNodes/AddingNodes.htm?zoom_highlight="single node"
To create a three-node cluster, I recommend the following procedure:

back up the existing single node
create a new node with the valid IP and/or hostname
restore the database on the new single node instance
then add the two nodes to this new instance

msanjib · January 2020

Thanks.
In fact, I followed the exact same steps.
my cluster is not starting. Is there any way to recover from this step?

Regards,
SM

admintools -t start_db -d appdefenderdb -p

Starting nodes:
v_appdefenderdb_node0001 (x.x.x.x)
v_appdefenderdb_node0002 (y.y.y.y)
v_appdefenderdb_node0003 (z.z.z.z)
Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.
Checking database state
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
It is suggested that you continue waiting.
Nodes in TRANSITIONAL state: ,,
Nodes DOWN: v_appdefenderdb_node0001, v_appdefenderdb_node0002, v_appdefenderdb_node0003 (may be still initializing).
It is suggested that you continue waiting.
Do you want to continue waiting? (yes/no) [yes]
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)
Node Status: v_appdefenderdb_node0001: (DOWN) v_appdefenderdb_node0002: (DOWN) v_appdefenderdb_node0003: (DOWN)

SruthiA · January 2020

please check startup.log to find where it is failing.

msanjib · January 2020

Please find the logs from the vertica.log file. I don't find any clue. May be I have enable debug level flag to get the complete information. The below log just repeating. No luck, let me check all the network pre-requisite one more time.
/appdefender/db_data/verticauser/appdefenderdb/v_appdefenderdb_node0001_catalog/vertica.log

2020-01-22 10:47:55.002 MetadataPoolMonitor:0x7f734bfff700 @v_appdefenderdb_node0001: 00000/7794: Updated metadata pool: Memory(KB): 22435
2020-01-22 10:47:57.010 Cluster Inviter:0x7f734bfff700 [Comms] My global sequence value is 131700
2020-01-22 10:47:59.001 Cluster Inviter:0x7f734bfff700 [Comms] My global sequence value is 131700
2020-01-22 10:48:01.001 Cluster Inviter:0x7f7359b0b700 [Comms] My global sequence value is 131700
2020-01-22 10:48:02.000 Timer Service:0x7f735b30e700 [Util] Task 'FeatureUseLogger' enabled
2020-01-22 10:48:02.000 Timer Service:0x7f735b30e700 [Util] Task 'LicenseSizeAuditor' enabled
Thanks
SM

msanjib · January 2020

Finally, it worked.
I believe the the cluster use default UDP broadcast traffic for cluster communication. Either I should have opted for " -T --point-to-point" while adding the hosts while calling install_vertica or update_vertica script.
This way I will enforce the cluster node(spread) to communicate using TCP instead of UDP. I believe this is the best option as it will not flood the network traffic.

I just disable to firewall for testing purpose and it worked.
Thanks for helping me troubleshooting.

SruthiA · January 2020

Good to know it worked. Please review the below link and make sure everything is set as per the recommendation to avoid performance hit and other issues.

https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/InstallationGuide/BeforeYouInstall/OsConfigTaskOverview.htm?

We're Moving!

Create My New Community Account Now

Migrating single node ( local host bind) IP address node to 3 nodes Vertica cluster

Best Answers

startup.log

Answers

admintools -t start_db -d appdefenderdb -p

Leave a Comment