how to synchronize tables between clusters
Due to some reason(hardware failure and others), I have two vertica clusters, and have set up ETL from MySQL (and other systems) to send the same data to two clusters(6TB). But our users created lots of temp tables(under a different "temp" schema) to hold some intermediate data (600GB) and they will use these tables to join with other ETLed tables. These temp tables are changed very often. And users may create their temp table only in one of the cluster. How do I synchronize these data so that whenever users create tables in one cluster, I have this table automatically replicated to another cluster in real-time (or near real-time) ?
0
Comments
Hi Bin,
How did you resolve this problem at the end?
I am facing the same problem at the moment. When I searched solutions online, I found your post. I would appreciate a lot if you share some experiences about solve this problem.
Thanks,
Lili
Hi
As far as I know there isn't replication (real time or not) on vertica.
I'll love to hear other people solutions.
Take a look at my blog post on copying data between clusters. It discusses which methods can export table structures.
With dual ingestion, you will still have to manage DDL replication to other clusters. What it really comes down to is what is an acceptable SLA.
Hi Norbert,
Thank you for your reply. I have read your blog before, which provided me excellent insights on how to copy data between Vertica clusters. That will help me achieve the first step of my plan, which is to copy certain tables from one cluster (let's say Cluster A) to the other cluster (let's say Cluster .
But based on my needs for the plan, the second step is to synchronize corresponding tables in Cluster B once those tables are updated in Cluster A. This step can be achieved by copying all those tables again, but I am wondering whether there is a lightweight way to do this in Vertica. I have been studying Vertica documentation, but haven't came up with a thorough plan. So I am wondering how other people do this, like Bin, who created this post.
Appreciate your time and any comment from you!
Regards,
Lili
You'll need a tool/process to accomplish that. Some organizations track changes by tagging batches and then performing a diff against other systems. For example, a batch would be assigned a certain ID and could then be reconciled on another cluster.
There is a way to do this in 7.2.X with the help of some scripting and vbr.py.
Requirements:
> Source and destination database should have the same number of nodes
> Both databases have to have access to a common location to share backups
Databases can vary on db and node names and IP addresses. Also destination database does not have to be stopped, it must be UP during this operation. Data movements are transactional, meaning you will not see inconsistent or currupted data during failures. It's also incremental, means first copy will copy everything (you don't have to have tables/schemas created manually, operation itself will do it) and subsequent data movements are incremental, it will only copy what's changed in the source database.
How it works:
> Two step process. Use vbr.py, take a backup of objects you want to copy from the source database and restore them to the destination database. You may create a cron job to automate these steps as needed.
Step 1: Take a backup of objects from the source database as you would take an object backup. No special configuration is required other than standard vbr object level backup configs.
Step 2: Restore objects into the destination database. This requires some configurations where you have to map source database nodes to destination database nodes. More details on how to do this mapping can be found in Vertica documentation.
http://my.vertica.com/docs/7.2.x/PDF/HP_Vertica_7.2.x_Administrators_Guide.pdf (P793-795)
As stated, all vbr.py operations (backup and restore) are incremental, it will only copy deltas on subsequent operations.
Things to be aware:
> Every data movement will look like a fresh load in the destination database, if you change data in the destination database, those changes will be overwritten by the latest snapshot of the source database.
Hi tharanga,
Why your way will work only at 7.2.X?
Why I can't do this on 7.X for example ?
Copying objects (schemas or tables) between databases is a new feature we introduced in 7.2.x. Prior to that, you can only restore objects to the same database.
FYI
Replication is supported since 7.2SP3
https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/AdministratorsGuide/BackupRestore/ReplicatingtoanAlternateCluster.htm?Highlight=replication