Loading data from non-AWS S3

Hi,

is there any way to load data into Vertica from a non AWS service that has Amazon S3 compatible API? For example CEPH, or something similar?

It seems that it's almost possible, except Vertica kinda depends on AWSRegion setting to figure out where to connect. Is there any workaround for this? Has anyone had any luck setting up something similar?

(We're using Vertica 9.1)

Answers

  • Jim_KnicelyJim_Knicely Administrator

    Maybe create and mount a Ceph Filesystem?

    See:
    http://docs.ceph.com/docs/kraken/cephfs/

  • Would strongly prefer to keep reuse the same loading as from Amazon S3. Is the code for the AWS loader available anywhere? We could probably easily modify the UDSource to fit both.

    Re: loading from file system - doable in principle, but if we are forced to ditch the initial idea of loading data using Vertica itself - we'll probably turn to something different. A python or scala script to accept generic S3 bucket. Mounting a CEPH filesystem is maybe a bit too fragile.

  • Hey gytis, I realize this was asked some time ago, not sure if you have a solution yet. I think I have a solution for you.

    It sounds like this is an on-prem database, correct? You can modify the AWSEndpoint parameter in Vertica to point to your Ceph or similar S3 compatible API host:port. Ensure the AWSRegion is consistent to that of your API. Then you can reference bucket names in copy commands as you would against S3 proper. Make sure you have all S3 authentication variables set based on the authentication your S3 compatible API requires.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file