Adding Node to Vertica Cluster
We have Vertica 9.2 installed on 3 nodes in a cluster. Now, we are trying to expand the cluster by adding a new (1) node.I am following documentation provided under https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/ManageNodes/AddingNodes.htm link; however, wonder if anyone has any documentation that would explicitly list each step involved and possible risks involved.
This is my first time installing vertica, so I am a bit unclear with step 2 where it states that 'before installing vertica on new node' we need to update hosts configuration file on all existing nodes in a cluster. Can this be done after Vertica is installed on new node? The node has not yet been added to cluster and /or database, so what could be wrong if Vertica is installed on the new node prior to updating hosts on other nodes in cluster?
Has anyone experienced any issues during adding a node? How long does this process take? What if something goes wrong? Is Vertica going to revert itself to the old (before) structure or there are some additional steps to be done.
If all goes well, are there any additional step to optimize Vertica and verify that new node is working?
Thanks for sharing!
Comments
you need to make sure that your that node which is being added has been configured as per the steps in the below link
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/InstallationGuide/BeforeYouInstall/BeforeYouInstallVertica.htm
You need to have passwordless SSH enabled between new node and other 3 existing nodes of the cluster. Has it been done?
Thanks Sruthi. we plan to add a node this coming Friday.
I have another question. Adding a node doc states that "You will also need to edit the hosts configuration file on all of the existing nodes in the cluster to ensure they can resolve the new host." - how do I go about doing so?
Hosts file as in
/etc/hosts
I don't think it is necessary if you are specifying IP addresses for the
install_vertica --hosts
part.Hello, we have added a new node successfully to cluster and database, re-balancing finished with 'successful' message. All 4 nodes are UP, but there is no change in the storage space on original 3 nodes. The new node shows almost 85 % free while the old nodes remained at their original values. What happened?
I was under impression that re-balancing will distribute all data evenly across the nodes. I don't have any old projections/partitions. K Safety is at 1. All seems fine; yet the storage is not balanced ... What am I missing?
What is the output of following query?
SELECT host_name, disk_space_used_mb, disk_space_total_mb FROM host_resources;
Sruthi, here is the output:
Host4 is the newly added node. I have not yet added node4 to backup, so right now, backup is executing only 3 nodes. Can this contribute to differences in sizes? We have 4 recent backups available and from what I could see, backups are only 4 KB in size.
Is running vbr -t init -c full_backup.ini on one of the original nodes in cluster will force creating backup locations and objects on new node?
If there are backups on the nodes, then the following query will give a better view of ROS container size per node:
SELECT node_name, SUM(used_bytes) FROM V_MONITOR.STORAGE_CONTAINERS group by node_name;
FYI - the size of ros conatiners seems to be balanced:
However, queries from storage_usage, disk_storage and host_resources still show nodes1-3 with very small available space (30%) while node4 is showing with 85% available space.
Do all the nodes have same disk space? what is the output of df -h and tune2fs -l .- You can capture it from df -h output.
Here is the output:
The sizes are pretty much balanced except /root. Looking at storage_usage output, /dev/sda2 shows big difference. I wonder how to drill down to see what exactly is occupying that space. Theonly difference between the nodes is backup. Node4 is not yet added, so node4 is missing backups folder with all its objects. But I woudl not expect backup to be so huge.
Can you share me the output of the following from all nodes
tune2fs -l /dev/partition
Replace /dev/parition with the mount which is used by vertica.
can we get a peak into the full_backup.ini file , are you storing the backup on the nodes itself?
Here is the content of backup ini file:
[Misc]
snapshotName = backup_snapshot
restorePointLimit = 3
objectRestoreMode = createOrReplace
passwordFile = pwdfile
[Database]
dbName = VerticaDB
dbUser = dbadmin
[Transmission]
[Mapping]
v_node0001 = x.x.x.x:/home/dbadmin/backups
v_node0002 = x.x.x.x:/home/dbadmin/backups
v_node0003 = x.x.x.x:/home/dbadmin/backups
Your restorepoint limit is 3 and you are storing it on your local filesystem. so approximately your backup is occupying 3*370GB=1,480GB on the other 3 nodes. Hence you have less space on three nodes.
Please review the below link on how to estimate disk space requirements for backup hosts.
https://www.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/AdministratorsGuide/BackupRestore/ConfiguringBackupHosts.htm
Sruthi - thanks for helping me understand. I have two questions:
1. since we are mapping only nodes1-3 in backup, does it mean that our backups are not complete (since node4 is missing) or...
2. backup vbr backs up entire database to each of participating hosts, which means, I have 3 hosts - each having 4 recent backups - so I have 12 full backups of Vertica database available.
For number 1, Nope no data is missing.. Since you took backup before adding nodes, it is just present on 3 nodes.
can you share me the output of du-sh /home/dbadmin/backup on all nodes
Sruthi, I created a backup before adding a node on Friday PM. We have backups executing every day at midnight, so there are 4 backups currently available:
1. backup before adding a new node
2. Sat AM backup after node was added (3 nodes participating)
3. Sun AM backup after node was added (3 nodes participating)
4. Mon AM backup after node was added (3 nodes participating)
Tonight, automated backup will drop the snapshot taken before adding a new node.
du-sh /home/dbadmin/backups:
node1 823G
node2 822G
node3 832G
node4 (no backups directory. I have not added node4 to [Mapping] in ini file)
In such scenario, backup doesn't contain the data from node4( bascially the new data which has been loaded into database after adding node4)
Sruthi, the way I understood the concept of adding a node is that after re-balancing process completed, all existing data in database was evenly distributed among 4 nodes. There was a decrease (in used storage) of ~100GB on each original node and I saw >300GB added to node4. I suspected that this might have been database data.
When a backup ran at midnight, database was backed up from 3 participating nodes only. That make me think that any data distributed to node4 was not backed up and therefore I have 3 incomplete database backups sitting on each node.is that correct assumption?
Using below query, I concluded that the entire size of my database is 1.4T, with data evenly distributed between the nodes:
SELECT node_name, SUM(used_bytes) FROM V_MONITOR.STORAGE_CONTAINERS group by node_name order by 1;
To correct my issue, I need to add node4 to participate in backups as well as delete all 3 recent backups on each node b/c they are missing data copied to node4.
Yet, below statement from Vertica Documentation confuses me:
"You can use one or more backup hosts or a single S3 bucket to back up your database".
I only added 1 node and it seems like I have to update [Mapping] in backup ini file to account for that new node. I wonder how is this managed in a system where there are over 50 nodes? Shouldn't Vertica add new host to [Mapping] automatically when a new node is added as well as create backups structure on new node?
Since I have 3 nodes, do I really have to open ini file on each node and update it manually? How do others manage such tasks with a lot more nodes?
Yes your understanding is correct. You can use the following to remove all the backups using
vbr -t remove -c configfile.ini --archive="all"
There is no need to copy/create the ini file on all nodes of your cluster. It has to created just on one node from where you run vbr script.
Sruthi, how are such updates/removals managed by system with a lot of more nodes than I have? How can I update backup ini file so it is automatically copied to all other nodes in cluster? Can backups folder (and all related subfolders) be created automatically by Vertica cluster when using init command?
vbr -t init -c full_backup.ini
VBR works on a principle that you, the user, is given full control to maintain the *.ini file the way you want it. Today, it looks like your backups are to local disks. In the future, if needed, you can modify the config file to backup to S3, remote hosts, NFS, to another Vertica cluster, or even just backup certain objects/tables.
Like @SruthiA said, you just need to maintain one ini file on any one node of the cluster. From what was posted, it looks like you're maintaining one config file per node. This is unnecessary. It is not needed to copy this file over to all nodes either.
If adding one new node in a 50 node cluster, then edit this one single config file and change the mapping section to include the new node. Then run
vbr -t init
and VBR will check all the nodes if the directories exists & creates them if necessary. If it's a backup to S3, it ensures that it can connect to the S3 endpoint, access the bucket etc..https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/BackupRestore/VBRUtilityReference.htm
Thank you all. I think I have gathered all information I needed. We do use one ini file for all nodes. It is my unfamiliarity with the process that created all these questions. There are more questions, but for now, I am set.
FYI, since this post turned into backup discussion, I opened a new topic: https://forum.vertica.com/discussion/240871/vertica-backup-objects#latest
Thank you all who contributed here.
can we get a peak into the full_backup.ini file , are you storing the backup on the nodes itself?
can we get a peak into the full_backup.ini file , are you storing the backup on the nodes itself?