vbr configuration for incremental backup
Hi,
I try to configure backup strategy like: one monthly full backup + daily incrementals.
I put in backup ini file parameter restorePointLimit = 30 but every day I see full backups created.
Documentation is not very helpful in this subject.
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AdministratorsGuide/BackupRestore/RepeatingBackups.htm?tocpath=Administrator's Guide|Backing Up and Restoring the Database|Creating Backups|_____5
I'm using vertica 9.3 if this makes difference.
What is wrong with my setup?
Best Answer
-
LenoyJ - Select Field - Employee
And for future additional reference - continuing from my previous post, let's add some data to the database and see what happens with the backups\restore points:
We had already backed up 2gb of the data previously:dbadmin=> SELECT (SUM(used_bytes)/1024/1024/1024)::integer as used_bytes_gb FROM storage_containers WHERE node_name='v_lenoy_ent_3n1_node0001'; used_bytes_gb --------------- 2 (1 row)
$ vbr -t listbackup -c forum_backup.ini backup backup_type epoch incremental_snapshot_20200714_161329 full 69 incremental_snapshot_20200714_160858 full 69
$ du -sh /home/dbadmin/backups 2.5G /home/dbadmin/backups
Let's add some data:
dbadmin=> COPY store_orders_fact FROM '/home/dbadmin/vmart/Store_Orders_Fact.tbl' DIRECT; Rows Loaded ------------- 100000000 (1 row)
Checking the size:
dbadmin=> SELECT (SUM(used_bytes)/1024/1024/1024)::integer as used_bytes_gb FROM storage_containers WHERE node_name='v_lenoy_ent_3n1_node0001'; used_bytes_gb --------------- 4 (1 row)
So I loaded 2gb of additional data for a total of 4gb on node 1. Let's run the backup again with the same config file to create another restore point.
$ vbr -t backup -c forum_backup.ini Starting backup of database lenoy_ent_3n1. Participating nodes: v_lenoy_ent_3n1_node0001, v_lenoy_ent_3n1_node0002, v_lenoy_ent_3n1_node0003. Snapshotting database. Snapshot complete. Approximate bytes to copy: 5984580566 of 13881535731 total. [=================================================.] 99% Copying backup metadata. Finalizing backup. [==================================================] 100% Backup complete!
listbackup
has now 3 backups and we can restore to any of them:$ vbr -t listbackup -c forum_backup.ini backup backup_type epoch incremental_snapshot_20200714_164058 full 70 incremental_snapshot_20200714_161329 full 69 incremental_snapshot_20200714_160858 full 69
Let's check the directory size of node 1 now:
$ du -sh /home/dbadmin/backups 4.4G /home/dbadmin/backups
We have 3 full backups and the directory size is just 4.4gb (as opposed to 2+2+4 gb)! We can restore the database to any of the three backups. Let's do that:
I'm choosing the oldest restore point "incremental_snapshot_20200714_160858" which if you recall had my original 2gb of data. (Note: you need to shut down the database before a restore).$ vbr -t restore -c forum_backup.ini --archive=20200714_160858 Starting full restore of database lenoy_ent_3n1. Participating nodes: v_lenoy_ent_3n1_node0001, v_lenoy_ent_3n1_node0002, v_lenoy_ent_3n1_node0003. Restoring from restore point: incremental_snapshot_20200714_160858 Determining what data to restore from backup. [==================================================] 100% Approximate bytes to copy: 1912034304 of 7896955165 total. Syncing data from backup to cluster nodes. [==================================================] 100% Restoring catalog. Restore complete!
Checking the size now:
dbadmin=> SELECT (SUM(used_bytes)/1024/1024/1024)::integer as used_bytes_gb FROM storage_containers WHERE node_name='v_lenoy_ent_3n1_node0001'; used_bytes_gb --------------- 2 (1 row)
We're back to 2gb of data! For the heck of it, let's go back to the latest backup/restore point we had (20200714_164058):
$ vbr -t restore -c forum_backup.ini --archive=20200714_164058 Starting full restore of database lenoy_ent_3n1. Participating nodes: v_lenoy_ent_3n1_node0001, v_lenoy_ent_3n1_node0002, v_lenoy_ent_3n1_node0003. Restoring from restore point: incremental_snapshot_20200714_164058 Determining what data to restore from backup. [==================================================] 100% Approximate bytes to copy: 7896614870 of 13881535731 total. Syncing data from backup to cluster nodes. [==================================================] 100% Restoring catalog. Restore complete!
Checking the size now:
dbadmin=> SELECT (SUM(used_bytes)/1024/1024/1024)::integer as used_bytes_gb FROM storage_containers WHERE node_name='v_lenoy_ent_3n1_node0001'; used_bytes_gb --------------- 4 (1 row)
And we're back to 4gb of data! Hope that helps!
6
Answers
The
restorePointLimit
parameter is used for point-in-time backup\restore. This is probably why there are full backups being created. That is, if you have it set at 30 and running it once a day, you can return your database to how it was to any day in the last 30 days. For example, if you want to return the database to how it was on 07/01/2020, you can use the--archive
parameter while calling restore and set the appropriate date (in this case 20200701_xyz) to do so (you can use thelistbackup
task to list all backups to get the correct backup name)For your situation, and if you want the point in time recovery to be once a month, I would recommend creating two config files:
restorePointLimit
to be the default (1) and run this once a day. The backups will be incremental (as long as you use the same config file). If something goes wrong today you can restore the last day's backup.restorePointLimit
to be 12 and run it once a month. After 12 months, you'll have 12 point-in-time full backups (one for each month you ran it).If you don't care for point-in-time backups, just use the first config file. There may be other ways to do this but that's my 2 cents.
Hi,
Thank you for the answer.
I'm very fresh to Vertica, so maybe I don't understand the concepts.
I try to have situation, when I have full backup (created first day of the month as a baseline) and daily incrementals (to save disk space). I need also possibility to restore database not only for the first day of the month, but any day when these incrementals were created.
Next month I need a new full backup and new incrementals until the end of month. Previous backups can be discarded.
Is it possible?
Now, when I have restorePointLimit=1 I have just two FULL backups (not incremental, like you wrote).
When I set restorePointLimit=30 I also have FULL backups.
-bash-4.1$ vbr -t listbackup -c backup_test.ini
backup backup_type epoch
Backup_Incr_20200713_220004 full 80553850
Backup_Incr_20200712_220003 full 80386579
Backup_Incr_20200711_220004 full 80359377
Backup_Incr_20200710_220004 full 80331626
...and so on.
The definition of a full backup is as listed on the docs: https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/AdministratorsGuide/BackupRestore/TypesOfBackups.htm
Pay attention to this line:
listbackup
says it's a "full" backup because it's a full backup as per Vertica definition. But that does not mean when you run it twice it will backup all files twice at the file system level. Full backups using the same config file are always incremental and only copies the deltas. Let's take an example. I have a database with approximate data size on node 1 as below:Now I created a config file with
restorePointLimit
as 2.I ran the backup. And
listbackup
now shows something like the following:Let's check the backup directory size:
It's backed up 2.5gb (data + catalog). Let me run the backup again without adding any new data with the same config file.
Noticed it copied 0 bytes even though it is a full backup with
restorePointLimit
of 2? Let's check the size at the directory level again.It's still the same size. And
listbackup
has two full backups\restore points, you can restore back to any of them:Hope that helps to understand that full backups (as per Vertica definition) are always incremental and you can restore back to any of the backups you took.
Now, you said:
In this case, setting
restorePointLimit
to 30 is accurate. If you run it once a day, you will be able to restore to any one day in the last 30 days. After the 31st run, the oldest backup will be removed.Hi,
This is an comprehensive answer!
I was misleaded by this 'full' word when I run vbr -t listbackup. I expected something like 'incr'.
Just for the curiosity: In which case vbr reports other backup_type field?
The ones that show up are full backups, object level backups and hard-link local backups.
vbr
in Enterprise mode also supports replicating objects and copying an entire cluster from one to another - but these won't be listed inlistbackup
as they aren't backups that you can restore back to...LenoyJ : Thanks for the good explanation. However, I'm curious about how Vertica handles restore point limit. I mean, if set my restorePointLimit parameter 7 and take a backup, the first time will be "full_backup" and the following will be incremental, but on the 8th day, when it has to remove the oldest backup, would it remove the "full_backup"? If yes, how can the backup be a complete backup?
@Girish_Nanjappa, good question. When you run a backup,
vbr
creates a "manifest" which contains a list of all the filesvbr
needs for restoring to that restore point. When the time comes to remove the oldest backup,vbr
will only remove the data files that no other restore point is referencing. That way your second oldest restore point will be your complete backup.You can go into your backup directory and look inside these manifest files for yourself. There will be a "backup manifest" which contains all files referenced by all restore points. And there will be "snapshot manifests", one for each restore point that contains all the files needed for that particular restore point.
Backup manifest:
One of my Snapshot manifests:
Thanks in advance.
Already done - you can find them in /opt/vertica/share/vbr
https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/AdministratorsGuide/BackupRestore/SampleConfigFiles/SampleIniFiles.htm?tocpath=Administrator's Guide|Backing Up and Restoring the Database|Sample VBR .ini Files|_____0
Can you also please assist me, how i can schedule the backup and restoration process in crontab. Please share steps.
I'm new in vertica db, i just start learning.
please guide me how i can setup vertica db schema replication.
I have 3 node server on DC location and 3 node server on DR location.
I'm want to replicate my vdu user from dc to dr and dr to dc.
----
1. Both locations are piing and reachable to each other
2. Both location have setup ssh passwordless configuration.
3. Both locations have samen db name and user name. And same node name with different IP address.
Thanks in advance.