Replication VBR deleted the data

veerkumar · October 2020

we have 3 node cluster each at primary and secondary site. normally we do delta replication from primary to secondary. everything works fine.
due to disaster secondary was up. and data is coming to secondary site.
now when primary site is functional and up, we have shifted all the services from secondary to primary.
at this point
for the past few days when primary was down, data count in secondary is greater than primary (primary data count is zero) for those days.
and now when we have started delta replication from primary to secondary, suddenly all the data which was there in secondary site is deleted.
ideally delta replication should only copy the data which is not in secondary and available in primary. and it should not delete any data at any cost.

Ideally it should be like below. on day 6 services are running from PR and delta replication is started from PR to DR.
Day1 Site/DataCount Site/DataCount
1 PR/100 DR/100
2 PR/100 DR/100
3 PR/0 DR/100
4 PR/0 DR/100
5 PR/0 DR/100
6 PR/100 DR/100
7 PR/100 DR/100

After disaster fix, we started replication from PR to DR.
Below is the situation we are currently facing. all the data in secondary for days (3,4,5) is deleted.
Day1 Site/DataCount Site/DataCount
1 PR/100 DR/100
2 PR/100 DR/100
3 PR/0 DR/0
4 PR/0 DR/0
5 PR/0 DR/0
6 PR/100 DR/100
7 PR/100 DR/100

why is this strange behaviour ? are we missing something ? how do we prevent vbr from deleting any data ? do we need to tune something ?

LenoyJ · October 2020

Replicate essentially "replicates" the object (as is) from source to target. If source object has no data, and you trigger replicate, the target object will also reflect that.

@veerkumar said:
and it should not delete any data at any cost.

There are many usecases in the database world where a table's data get deleted for various valid reasons. You would want your DR environment to also reflect these changes as is, else your DR environment will not be an accurate representation of your Primary.

My suggestion is not to trigger a replicate from 'Primary -> DR' when the DR has the most accurate representation of your data. You should rather trigger replicate from 'DR -> Primary' instead.

I could also imagine you using the objectRestoreMode=coexist parameter to ensure that data does not get overwritten and then if your tables are partitioned, use swap partition to instantaneously swap the accurate data into the primary's table.

Also, it's good to know that VBR uses rsync internally for replicate and copycluster activities.
Others may chime in here for any other ideas.

veerkumar · October 2020

Thank you for response.
Is there any change in configuration that we can do ? such that it does not "replicate" the object, rather than just insert the new data of primary to destination ?

viganog · October 2020

If this is only for specific tables, you can use EXPORT to VERTICA
1) identify the epoch of your backup:
$ vbr.py --task listbackup --config-file backup.ini**
backup backup_type epoch .....
bck_snapshot_20200722_092104 full 196 .....
bck_snapshot_20200722_085950 full 193 ......
2) export only the row higher than the backup epoch
vsql> EXPORT TO VERTICA AS SELECT * FROM my_table WHERE epoch > 196;

Replication VBR deleted the data

Answers

Leave a Comment