COPY data from a customer privided S3 bucket
Hi forum,
At this time I can COPY files from our own Vertica provided S3 bucket into an AWS EON V instance without issues:
, SoftwareModule
)
FROM 's3://p-vertica-poc/Data/01-iXR_cdf_events/dump-iXR_cdf_events-2016-01-01-12h-13h.csv.gz' gzip
DELIMITER ',' ENCLOSED BY '"' null E'\002NULL\003'
rejectmax 10
;
Unfortuately I get Vertica error 7160 (access denied) when trying this with files from the customer provided S3 bucket. awslib is available, we have run
ALTER SESSION SET UDPARAMETER FOR awslib aws_id=...
ALTER SESSION SET UDPARAMETER FOR awslib aws_secret=...
-- AWS Region: EU-West region: eu-west-1
select set_config_parameter('AWSRegion', 'eu-west-1');
Reaching the customer Bucket using the S3 Browser or the aws S3 command works.
Can you help me to better understand what the V doc says
Using COPY
Before you can read data from S3, you must create an IAM role for your EC2 instances to use, and grant that role permission to access your S3 resources.
Would you have experience in this area and have hints (or even better statements) for me what to do?
Thank you
Dieter
Comments
Hi,
Is the client using AWS Identity and Access Management (IAM)?
Thanks!
Hi Jim,
sorry for my ignorance on this topic.
I am starting the COPY on the Linux node of the V AWS EON instance.
I was not assuming the need for additional installations.
How can I check the topic your raise?
Thanks
Dieter
You use IAM instead of setting the AWS id and secret key. But it looks like you are still setting those. So can you post the out put from he COPY command, including the error?
Note: Vertica ERROR 7160 is "Cannot expand glob pattern due to error: string"
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/ErrorCodes/SqlState-22023.htm
Hi Dieter,
You have two options:
1) set session parameters (e.g. aws_id, aws_secret, aws_region), and use AWS UDX: COPY tbl WITH SOURCE S3(url='s3://bucket/...');
2) set config parameters (AWSAuth, and AWSRegion), and use native S3 file system: COPY tbl FROM 's3://bucket/...',
You are getting that error because you are setting the session parameters but using S3 file system.
Yes asaeidi,
you point exactly to the very root of the issue!
I want to run option 2 (assuming this is faster) but setting the parameters for option 1.
I have found AWSRegion but did not find AWSAuth in the V doc:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/AWSParameters.htm?Highlight=AWSregion
What do I need to specify for AWSAuth, are there more parameters I need to set?
Thank you
Dieter
Like this:
select set_config_parameter('AWSAuth','key:secret');
key = aws_id
secret = aws_secret
It's interesting that the AWSAuth parameter is not documented (i.e. not in the online docs). Plus you cannot see the AWSAuth parameter in configuration_parameters system table, only vs_configuration_parameters:
Is this a feature we should hide from clients?
The AWSAuth parameter took me one step further .. to vertica.log
dbadmin(> )
dbadmin-> FROM 's3://cf-s3-55a4828a-b432-4807-a648-c30cc3fc76e9/Data/01-iXR_cdf_events/dump-iXR_cdf_events-2016-01-01-12h-13h.csv.gz' gzip
dbadmin-> DELIMITER ',' ENCLOSED BY '"' null E'\002NULL\003'
dbadmin-> rejectmax 10
dbadmin-> ;
ERROR 8162: Something went wrong while copying the files to persistent location. Please look at the vertica.log for more details
dbadmin=>
Lets see ..
Dieter
You only need AWSAuth if you don't use IAM.
We recommend using IAM, and that's why AWSAuth is not in the documentation.
@vasaeidi - Gotcha! Thanks for the explanation.
Hi,
I do append the lines from vertica.log which do show the failed COPY and the messages after this. What is the path I should take now please?
Thanks
Dieter
From the logs it seems that after changing your global S3 file system configurations, you can no longer write to your communal location, and that is why copy fails:
Well, that sounds like its going to be an issue with an EON deployment!
@DieterC - Looks like you will have to use option #1?
1) set session parameters (e.g. aws_id, aws_secret, aws_region), and use AWS UDX: COPY tbl WITH SOURCE S3(url='s3://bucket/...');
But I wonder if that will even work and result in the same "Access Denied" error?
Hi asaeidi,
for this COPY I need access to two S3 buckets in a single command.
1. "the Vertica bucket" holding the ROS containers
2. The "customer provided bucket" keeping the source data.
Are your comments saying that I can only access a single bucket for the execution of the COPY command?
Do V assume the source data and the ROS containers are in the same bucket?
Other queries run normal, also with setting AWSAuth to the customer bucket.
Is there a chance to name two S3 buckets in the AWSAuth command?
Thanks
Dieter
Dieter,
For your use case, I would recommend you use AWS UDX (option 1).
Currently, S3 file system does not support accessing two buckets in different regions at the same time.
Hi asaeidi,
the two S3 buckets are in the same region, eu-west-1.
Can I please reset AWSAuth? I am now getting problems when collecting statistics. I cannot access my own ROS
dbadmin=> select ANALYZE_STATISTICS ('');
WARNING 3314: Exception while running analyze_statistics: StorageBundleReader: Error opening file [s3://philips-vertica-poc/EONros/b 97/023f4086ac94eb50337005224d8d6e5600a0000000018696_0.gt] for read: Access Denied
WARNING 3314: Exception while running analyze_statistics: StorageBundleReader: Error opening file [s3://philips-vertica-poc/EONros/d 48/023f4086ac94eb50337005224d8d6e5600a0000000018504_0.gt] for read: Access Denied
Thanks
Dieter
I believe you could reset AWSAuth as below:
Thank you,
first I did run a
ALTER DATABASE P..s CLEAR AWSAuth;
which did not help.
After a stop/start of the instance and DB stats did run.
Puh! Looks like the V instance feels OK again.
Best
Dieter
Hi all,
is AWSAuth something we SHOULD use, if we are running our Vertica clusters outside AWS, in private cloud?
I have just tested it and it works.
Should we ask for add it into documentation through Support Portal?
Regards
Jan Soubusta
Yes, you need to set AWSAuth for that case.
It is being added to the documentation.
hi, I am setting up IAM role to access S3. What is the COPY command if I use IAM role?
Example Copy command using IAM Auth to S3 bucket
COPY <schema>.<table name> FROM 's3://my/s3/bucket/file.ext';