schema backup size 3-node cluster
we have a 3 node Vertica 7.1 cluster and I want to build a backup at schema level.
I have followed the vbr utility instructions and built the following config file:
snapshotName = vg_stats_snapshot
verticaConfig = True
restorePointLimit = 2
objects = bd_snapshot
tempDir = /vertica_backup/tmp/vbr
dbName = vertica
dbUser = dbadmin
dbPromptForPassword = True
v_vertica_node0001 = bi-vertica01:/vertica_backup
v_vertica_node0002 = bi-vertica02:/vertica_backup
v_vertica_node0003 = bi-vertica03:/vertica_backup
where /vertica_backup is a NFS
problem is: backup size is not what we expected and we don't know why.
First of all, I would have expected that:
- the first snapshot would create 3 directories /vertica_backup/v_vertica_node_000i/bd_snapshot/* and their size would depend basically on how data is segmented
- following snapshots would be incremental since restorePointLimit is > 1
What we found is:
- there are 3 directory, yes, but their size is the same (first strange thing) : meaning if the schema is 100Gb this first snapshot would use 100Gb x 3 size (more or less)
- each following snapshot done without changing anything on Vertica (no insert, no update or delete) has the exact same size of the first one : meaning each following snapshot is again 100Gb for a maximum of 2 (restorePointLimit)
- as a result we have a backup of about 300Gb (first snapshot for each node) + 600 Gb (2 incremental backup for each node : 3x2x100Gb) = 900Gb . This is really far from what we have expected more or less about 300 Gb of backup size.
in attachmente we posted a sample image that shows space usage for this situation (just with less data).
eplicating data on a single-node Vertica 6.1.2 cluster (we didn't have a single-node vertica 7.1 to compare with) with the same configuration works well (incremental are correct : very low metadata is added, no real data since there are no differences from previous snapshots). So this could be related to Vertica versions or node number
The strange things are basically:
- cluster backup seems to replicate the whole schema snapshot for each node
- incremental backup seems not to work since every snapshot has the same size of the previous one. (catalog data is replicated at each snapshot but it is very small, problem is about the real data size)
we would like to know why this happens and how to workaround. This is a big problem for our production environment since backup size is much higher and slower thant expected. We are customer account and would like to receive support.