We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Accessing path of loaded file from S3 — Vertica Forum

Accessing path of loaded file from S3

Is there a way for COPY to access the S3 bucket/folder/ of a loaded file and pass it back to field?
This would be super useful in case of having delete / reload a particular folder that has some issue with the data.
Thanks for any pointers.

Tagged:

Answers

  • marcothesanemarcothesane - Select Field - Administrator

    I don't think I understand what you are actually asking for.
    A COPY FROM S3 command looks, for example, like this:
    COPY myschema.mytable
    FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP
    DELIMITER ',' ENCLOSED BY '"' DIRECT
    Do you want to store the information of 'S3://my-bucket/my-dir/my-file.csv.gz' or parts of it in a column?
    Cheers -

  • Yes, that's correct, I like to capture /my-dir/ specifically. I found the CURRENT_LOAD_SOURCE() function that can be referenced in COPY, however, this only returns 'my-file.csv.gz' and not the rest of full URI of the S3 path.

  • marcothesanemarcothesane - Select Field - Administrator

    hmm ... then you can only find it again in the query_requests table ...
    So I have no better idea than write a script like so:
    COPY myschema.mytable FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP DELIMITER ',' ENCLOSED BY '"' DIRECT;
    SELECT REGEXP_SUBSTR(request,'from ''s3://my-bucket/([^/]+)/',1,1,'i',1) FROM query_requests
    WHERE request_type='LOAD'
    ORDER BY start_timestamp DESC LIMIT 1;

    REGEXP_SUBSTR
    my-dir

  • marcothesanemarcothesane - Select Field - Administrator

    Using Regular Expressions to parse the newest COPY command found in query_requests, and to extract the part of the string that comes, searched in case-insensitive mode, after the fromkeyword , and the string stub s3://my-bucket/ (we know the name of the bucket and only want the sub-directory), but before the next forward slash.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file