backup vertica to hadoop

I have customer ,who want to backup vertica database to hadoop hdfs
we use hadoop hdfs fuse to mount a system on remote hadoop, for example: hadoop-fuse-dfs dfs://192.168.113.101:8020 /home/dbadmin/hdfs
then we use vbr.py to backup database to hadoop file system, it return error:
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_catalog/Libraries/public_HdfsSource_45035996273739156.so" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_catalog/Snapshots" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/001" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/001/45035996273893001" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/001/45035996273894001" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/001/45035996273910001" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/005" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/005/45035996273893005" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/005/45035996273894005" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/005/45035996273910005" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/009" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/009/45035996273893009" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/009/45035996273894009" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/009/45035996273910009" failed: Unknown error 255 (255)
rsync: chgrp "/home/dbadmin/hdfs/cmsz/backup/vertica_bk/v_cmsz_vertica_node0001/bkcmsz_vertica~new~/home/dbadmin/cmsz_vertica/v_cmsz_vertica_node0001_data/013" failed: Unknown error 255 (255)

my question is: how to enable backup vertica database to hdfs? we don't want to export data to csv then transfer to hdfs, it's so inconvenience.


thanks so much




Comments

  • Hi Hawk,

    So, we don't officially support backups via hadoop-fuse-dfs, and I'm not personally super-familiar with it.  We expect a "normal" (POSIX-standards-compliant) filesystem.

    In this particular case:  I see lots of errors, but did the backup actually fail?  If you're able to do so without risking your production system, try restoring anyway; see if it works.

    Vertica's backup process tries to set file permissions on the backup to match those of the original catalog.  What's failing is setting the owning group for the files.  Apparently, either Hadoop (or hadoop-fuse-dfs) doesn't know about the Linux-level group that owns these files, or hadoop-fuse-dfs doesn't support setting which group owns a particular file.  Fortunately, we don't do anything particularly exciting with the owning group right now; any reasonable value should work.  (This may change in the future.  As always, if you're doing something a little unusual, make sure to test it prior to deploying any major upgrade of Vertica.)

    Other filesystem permissions do matter.  In particular, some files in our catalog directory have the "execute" permission set.  It needs to stay set across a backup/restore.  Otherwise, C++ user-defined functions, as well as some internal functions as of Vertica 7 (for example, those used by Flex Tables), will not work properly after a restore.

    If you want a solution that's likely to be more reliable, but don't want to manually dump to .csv files:  Use vbr.py to back up temporarily to a regular local filesystem somewhere.  Then use "tar" to pack the backup directory into a single file (or multiple .tar archives, if you prefer), and load that file into Hadoop.  "tar" properly encodes POSIX file permissions.

    Alternatively:  Vertica does have a Hadoop connector.  Rather than dumping to .csv, you could dump to a Hadoop job that stores the data wherever/however you'd like.

    Adam
  • thanks so much, I will try it

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file