Accessing path of loaded file from S3

atomix · September 2019

Is there a way for COPY to access the S3 bucket/folder/ of a loaded file and pass it back to field?
This would be super useful in case of having delete / reload a particular folder that has some issue with the data.
Thanks for any pointers.

marcothesane · September 2019

I don't think I understand what you are actually asking for.
A COPY FROM S3 command looks, for example, like this:
COPY myschema.mytable
FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP
DELIMITER ',' ENCLOSED BY '"' DIRECT
Do you want to store the information of 'S3://my-bucket/my-dir/my-file.csv.gz' or parts of it in a column?
Cheers -

atomix · September 2019

Yes, that's correct, I like to capture /my-dir/ specifically. I found the CURRENT_LOAD_SOURCE() function that can be referenced in COPY, however, this only returns 'my-file.csv.gz' and not the rest of full URI of the S3 path.

marcothesane · September 2019

hmm ... then you can only find it again in the query_requests table ...
So I have no better idea than write a script like so:
COPY myschema.mytable FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP DELIMITER ',' ENCLOSED BY '"' DIRECT;
SELECT REGEXP_SUBSTR(request,'from ''s3://my-bucket/([^/]+)/',1,1,'i',1) FROM query_requests
WHERE request_type='LOAD'
ORDER BY start_timestamp DESC LIMIT 1;

REGEXP_SUBSTR
my-dir

marcothesane · September 2019

Using Regular Expressions to parse the newest COPY command found in query_requests, and to extract the part of the string that comes, searched in case-insensitive mode, after the fromkeyword , and the string stub s3://my-bucket/ (we know the name of the bucket and only want the sub-directory), but before the next forward slash.

Accessing path of loaded file from S3

Answers

Leave a Comment