Is there a way for COPY to access the S3 bucket/folder/ of a loaded file and pass it back to field?
This would be super useful in case of having delete / reload a particular folder that has some issue with the data.
Thanks for any pointers.
I don't think I understand what you are actually asking for.
A COPY FROM S3 command looks, for example, like this: COPY myschema.mytable FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP DELIMITER ',' ENCLOSED BY '"' DIRECT
Do you want to store the information of 'S3://my-bucket/my-dir/my-file.csv.gz' or parts of it in a column?
Cheers -
Yes, that's correct, I like to capture /my-dir/ specifically. I found the CURRENT_LOAD_SOURCE() function that can be referenced in COPY, however, this only returns 'my-file.csv.gz' and not the rest of full URI of the S3 path.
hmm ... then you can only find it again in the query_requests table ...
So I have no better idea than write a script like so: COPY myschema.mytable FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP DELIMITER ',' ENCLOSED BY '"' DIRECT; SELECT REGEXP_SUBSTR(request,'from ''s3://my-bucket/([^/]+)/',1,1,'i',1) FROM query_requests WHERE request_type='LOAD' ORDER BY start_timestamp DESC LIMIT 1;
Using Regular Expressions to parse the newest COPY command found in query_requests, and to extract the part of the string that comes, searched in case-insensitive mode, after the fromkeyword , and the string stub s3://my-bucket/ (we know the name of the bucket and only want the sub-directory), but before the next forward slash.
Answers
I don't think I understand what you are actually asking for.
A COPY FROM S3 command looks, for example, like this:
COPY myschema.mytable
FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP
DELIMITER ',' ENCLOSED BY '"' DIRECT
Do you want to store the information of
'S3://my-bucket/my-dir/my-file.csv.gz'
or parts of it in a column?Cheers -
Yes, that's correct, I like to capture /my-dir/ specifically. I found the CURRENT_LOAD_SOURCE() function that can be referenced in COPY, however, this only returns 'my-file.csv.gz' and not the rest of full URI of the S3 path.
hmm ... then you can only find it again in the
query_requests
table ...So I have no better idea than write a script like so:
COPY myschema.mytable FROM 'S3://my-bucket/my-dir/my-file.csv.gz' GZIP DELIMITER ',' ENCLOSED BY '"' DIRECT;
SELECT REGEXP_SUBSTR(request,'from ''s3://my-bucket/([^/]+)/',1,1,'i',1) FROM query_requests
WHERE request_type='LOAD'
ORDER BY start_timestamp DESC LIMIT 1;
REGEXP_SUBSTR
my-dir
Using Regular Expressions to parse the newest COPY command found in
query_requests
, and to extract the part of the string that comes, searched in case-insensitive mode, after thefrom
keyword , and the string stubs3://my-bucket/
(we know the name of the bucket and only want the sub-directory), but before the next forward slash.