The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.

Accessing path of loaded file from S3

Is there a way for COPY to access the S3 bucket/folder/ of a loaded file and pass it back to field?
This would be super useful in case of having delete / reload a particular folder that has some issue with the data.
Thanks for any pointers.

Tagged:

Answers

  • I don't think I understand what you are actually asking for.
    A COPY FROM S3 command looks, for example, like this:
    COPY myschema.mytable
    FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP
    DELIMITER ',' ENCLOSED BY '"' DIRECT
    Do you want to store the information of 'S3://my-bucket/my-dir/my-file.csv.gz' or parts of it in a column?
    Cheers -

  • Yes, that's correct, I like to capture /my-dir/ specifically. I found the CURRENT_LOAD_SOURCE() function that can be referenced in COPY, however, this only returns 'my-file.csv.gz' and not the rest of full URI of the S3 path.

  • hmm ... then you can only find it again in the query_requests table ...
    So I have no better idea than write a script like so:
    COPY myschema.mytable FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP DELIMITER ',' ENCLOSED BY '"' DIRECT;
    SELECT REGEXP_SUBSTR(request,'from ''s3://my-bucket/([^/]+)/',1,1,'i',1) FROM query_requests
    WHERE request_type='LOAD'
    ORDER BY start_timestamp DESC LIMIT 1;

    REGEXP_SUBSTR
    my-dir

  • Using Regular Expressions to parse the newest COPY command found in query_requests, and to extract the part of the string that comes, searched in case-insensitive mode, after the fromkeyword , and the string stub s3://my-bucket/ (we know the name of the bucket and only want the sub-directory), but before the next forward slash.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.