Vertica cluster(DC) to cluster(DR) replication , may be using streaming technology

msanjibmsanjib Vertica Customer

Hi Team,
I have a customer setup which is of 3 node cluster. They would like to setup a DR setup which is exact replica setup of production. I am looking for a replication solution from production cluster to DR setup so that there will be no event loss.
AFAIK, I can use vbr tool supported by Vertica to replicate the entire cluster. However, I am skeptical about the data loss part. In what frequency will it sync the data? Does it use some kind of kafka streaming so that we set offset and don't loss any data? I don't know it's internal how it works.
The other option I found in vertica doc, we can use kafka for data replication using inbuilt kafka package in vertica. Can someone help me to understand how it work?

Thanks in advance,



  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2019

    You said: "I am skeptical about the data loss part" - What data loss part? The Replication operation runs in a transaction so failures won't cause any data corruptions.

    You said: "In what frequency will it sync the data" - That's up to you. Vbr replication is incremental. That is, the first replication copies all the data and subsequent replications copy only modified/ newly added data. You can create a cron tab job to run vbr on a set schedule.

  • Options
    msanjibmsanjib Vertica Customer

    Thanks for the quick response.
    My only concern is high volume data and with high velocity. The application is an event generator, the EPS of the event can go around 5-10 K , which can go very high in velocity and size. I can see a delta change of 3 GB in 5 minutes time and that has to be synced over WAN link.
    I believe the vbr tool works using rsync for replication and I believe object level replication is best fit for this solution. I am just wondering how it checks the offset and avoid data duplication , if we have a file already synced and try to restore again.
    It will be a great help if we have any best practice documentation for replication of cluster.

    There is one slightly related question, since the customer works under compliance I would like to setup the encrypt and checksum option to true ,that will impact the performance as well, right?
    Do you see any other option for this case, or this is the best option. I was checking another option where we can stream the data using kafka integration ( vkconfig tool). Any advise?

  • Options
    Xin_DMXin_DM Employee
    edited January 2020

    We check the file size for file identity. So if a file is already synced and tries to sync again, rsync list the file first and see if the file size match, if match, we skip the copy.
    Yes, encryption requires process overhead.

    How far behind can the DR cluster left behind, would you like to sync daily or by hour?

  • Options
    msanjibmsanjib Vertica Customer

    Thanks for the comment. I am looking for syncing every 10-15 minutes.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file