Increase efficiency of backup
Hi, I have an thought to make the Vertica backup process (based on vbr.py) more efficient. Currently at a operating system file level, the same data is stored multiple times, many across multiple nodes for improved Vertica performance. This is good from a DB access perspective but needs to be managed properly for the backup strategy to be efficient. The current vbr.py backup process rsyncs the files to the backup server as if they are all different files. There are utilities can compare if 2 files are the same and instead keeping them as duplicates, eliminates one and replaces the eliminated file with a file of the same inode number of the second file. For example, check out the utility http://fossies.org/linux/privat/fslint-2.42.tar.gz:a/fslint-2.42/fslint/findup If Vertica can come up with a similar strategy, it can make the backups work easier. Zacharia Mathew
0
Comments