COPY data from a customer privided S3 bucket

Hi forum,

At this time I can COPY files from our own Vertica provided S3 bucket into an AWS EON V instance without issues:

, SoftwareModule
)
FROM 's3://p-vertica-poc/Data/01-iXR_cdf_events/dump-iXR_cdf_events-2016-01-01-12h-13h.csv.gz' gzip
DELIMITER ',' ENCLOSED BY '"' null E'\002NULL\003'
rejectmax 10
;

Unfortuately I get Vertica error 7160 (access denied) when trying this with files from the customer provided S3 bucket. awslib is available, we have run

ALTER SESSION SET UDPARAMETER FOR awslib aws_id=...
ALTER SESSION SET UDPARAMETER FOR awslib aws_secret=...
-- AWS Region: EU-West region: eu-west-1
select set_config_parameter('AWSRegion', 'eu-west-1');

Reaching the customer Bucket using the S3 Browser or the aws S3 command works.

Can you help me to better understand what the V doc says

Using COPY
Before you can read data from S3, you must create an IAM role for your EC2 instances to use, and grant that role permission to access your S3 resources.

Would you have experience in this area and have hints (or even better statements) for me what to do?

Thank you
Dieter

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    Is the client using AWS Identity and Access Management (IAM)?

    Thanks!

  • Hi Jim,
    sorry for my ignorance on this topic.
    I am starting the COPY on the Linux node of the V AWS EON instance.
    I was not assuming the need for additional installations.
    How can I check the topic your raise?
    Thanks
    Dieter

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    You use IAM instead of setting the AWS id and secret key. But it looks like you are still setting those. So can you post the out put from he COPY command, including the error?

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017

    Note: Vertica ERROR 7160 is "Cannot expand glob pattern due to error: string"

    See:
    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/ErrorCodes/SqlState-22023.htm

  • asaeidiasaeidi - Select Field - Employee

    Hi Dieter,

    You have two options:
    1) set session parameters (e.g. aws_id, aws_secret, aws_region), and use AWS UDX: COPY tbl WITH SOURCE S3(url='s3://bucket/...');
    2) set config parameters (AWSAuth, and AWSRegion), and use native S3 file system: COPY tbl FROM 's3://bucket/...',

    You are getting that error because you are setting the session parameters but using S3 file system.

  • Yes asaeidi,
    you point exactly to the very root of the issue!
    I want to run option 2 (assuming this is faster) but setting the parameters for option 1.
    I have found AWSRegion but did not find AWSAuth in the V doc:

    https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/AWSParameters.htm?Highlight=AWSregion

    What do I need to specify for AWSAuth, are there more parameters I need to set?
    Thank you
    Dieter

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017

    Like this:

    select set_config_parameter('AWSAuth','key:secret');

    key = aws_id
    secret = aws_secret

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017

    It's interesting that the AWSAuth parameter is not documented (i.e. not in the online docs). Plus you cannot see the AWSAuth parameter in configuration_parameters system table, only vs_configuration_parameters:

    dbadmin=> select parameter_name, description from configuration_parameters where parameter_name = 'AWSAuth';
     parameter_name | description
    ----------------+-------------
    (0 rows)
    
    dbadmin=> select parameter_name, description from vs_configuration_parameters where parameter_name = 'AWSAuth';
     parameter_name |                          description
    ----------------+----------------------------------------------------------------
     AWSAuth        | Keys to authenticate with AWS Services (<access key>:<secret>)
    (1 row)
    

    Is this a feature we should hide from clients?

  • The AWSAuth parameter took me one step further .. to vertica.log

    dbadmin(> )
    dbadmin-> FROM 's3://cf-s3-55a4828a-b432-4807-a648-c30cc3fc76e9/Data/01-iXR_cdf_events/dump-iXR_cdf_events-2016-01-01-12h-13h.csv.gz' gzip
    dbadmin-> DELIMITER ',' ENCLOSED BY '"' null E'\002NULL\003'
    dbadmin-> rejectmax 10
    dbadmin-> ;
    ERROR 8162: Something went wrong while copying the files to persistent location. Please look at the vertica.log for more details
    dbadmin=>

    Lets see ..
    Dieter

  • asaeidiasaeidi - Select Field - Employee

    You only need AWSAuth if you don't use IAM.
    We recommend using IAM, and that's why AWSAuth is not in the documentation.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @vasaeidi - Gotcha! Thanks for the explanation.

  • Hi,
    I do append the lines from vertica.log which do show the failed COPY and the messages after this. What is the path I should take now please?
    Thanks
    Dieter

  • asaeidiasaeidi - Select Field - Employee

    From the logs it seems that after changing your global S3 file system configurations, you can no longer write to your communal location, and that is why copy fails:

    2017-12-06 16:08:16.415 Init Session:7fc7417fa700-a0000000004270 [Util] <INFO> Caught exception in Uploader Could not stat file [s3://philips-vertica-poc/EONros/bb4/02c5198768544bb809efcb20b1c94d2400a000000003e480_0.gt]: Access Denied
    2017-12-06 16:08:16.418 Init Session:7fc7417fa700-a0000000004270 [SAL] <WARNING> [S3UDFS] [RETRY] [Attempt 0/50] [Canceled? N] [Should retry? N] [Operation virtual void SAL::S3FileSystem::stat(const char*, stat*) const] AccessDenied (15) : 'Access Denied'
    2017-12-06 16:08:16.418 Init Session:7fc7417fa700-a0000000004270 <LOG> @v_philips_node0001: 42501/2898: Could not remove file [s3://philips-vertica-poc/EONros/bb4/02c5198768544bb809efcb20b1c94d2400a000000003e480_0.gt]: Could not stat file [s3://philips-vertica-poc/EONros/bb4/02c5198768544bb809efcb20b1c94d2400a000000003e480_0.gt]: Access Denied
    2017-12-06 16:08:16.506 Uploader:7fc6b5d6e700 [SAL] <WARNING> [S3UDFS] [RETRY] [Attempt 0/50] [Canceled? N] [Should retry? N] [Operation virtual void SAL::S3FileSystem::stat(const char*, stat*) const] AccessDenied (15) : 'Access Denied'
    2017-12-06 16:08:16.507 Uploader:7fc6b5d6e700 <LOG> @v_philips_node0001: 42501/2898: Could not remove file [s3://philips-vertica-poc/EONros/1e1/02c5198768544bb809efcb20b1c94d2400a000000003e51e_0.gt]: Could not stat file [s3://philips-vertica-poc/EONros/1e1/02c5198768544bb809efcb20b1c94d2400a000000003e51e_0.gt]: Access Denied
    2017-12-06 16:08:16.517 Spread Client:7fc76a7fc700 [Session] <WARNING> Trying to cancel Session v_philips_node0001-1800:0xaf but have not found any cancel handle.
    2017-12-06 16:08:16.518 Init Session:7fc7417fa700-a0000000004270 [Dist] <INFO> Cleaning up Uploader on fail
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2017

    Well, that sounds like its going to be an issue with an EON deployment!

    @DieterC - Looks like you will have to use option #1?

    1) set session parameters (e.g. aws_id, aws_secret, aws_region), and use AWS UDX: COPY tbl WITH SOURCE S3(url='s3://bucket/...');

    But I wonder if that will even work and result in the same "Access Denied" error?

  • Hi asaeidi,
    for this COPY I need access to two S3 buckets in a single command.
    1. "the Vertica bucket" holding the ROS containers
    2. The "customer provided bucket" keeping the source data.
    Are your comments saying that I can only access a single bucket for the execution of the COPY command?
    Do V assume the source data and the ROS containers are in the same bucket?
    Other queries run normal, also with setting AWSAuth to the customer bucket.

    Is there a chance to name two S3 buckets in the AWSAuth command?
    Thanks
    Dieter

  • asaeidiasaeidi - Select Field - Employee

    Dieter,
    For your use case, I would recommend you use AWS UDX (option 1).
    Currently, S3 file system does not support accessing two buckets in different regions at the same time.

  • Hi asaeidi,
    the two S3 buckets are in the same region, eu-west-1.
    Can I please reset AWSAuth? I am now getting problems when collecting statistics. I cannot access my own ROS

    dbadmin=> select ANALYZE_STATISTICS ('');
    WARNING 3314: Exception while running analyze_statistics: StorageBundleReader: Error opening file [s3://philips-vertica-poc/EONros/b 97/023f4086ac94eb50337005224d8d6e5600a0000000018696_0.gt] for read: Access Denied
    WARNING 3314: Exception while running analyze_statistics: StorageBundleReader: Error opening file [s3://philips-vertica-poc/EONros/d 48/023f4086ac94eb50337005224d8d6e5600a0000000018504_0.gt] for read: Access Denied

    Thanks
    Dieter

  • asaeidiasaeidi - Select Field - Employee

    I believe you could reset AWSAuth as below:

    select set_config_parameter('AWSAuth', NULL);
    
  • Thank you,
    first I did run a
    ALTER DATABASE P..s CLEAR AWSAuth;
    which did not help.
    After a stop/start of the instance and DB stats did run.
    Puh! Looks like the V instance feels OK again.
    Best
    Dieter

  • Hi all,
    is AWSAuth something we SHOULD use, if we are running our Vertica clusters outside AWS, in private cloud?
    I have just tested it and it works.
    Should we ask for add it into documentation through Support Portal?
    Regards

    Jan Soubusta

  • asaeidiasaeidi - Select Field - Employee

    Yes, you need to set AWSAuth for that case.
    It is being added to the documentation.

  • rkg112rkg112 Vertica Customer

    hi, I am setting up IAM role to access S3. What is the COPY command if I use IAM role?

  • sahil_kumarsahil_kumar Vertica Employee Employee
    edited May 2020

    Example Copy command using IAM Auth to S3 bucket
    COPY <schema>.<table name> FROM 's3://my/s3/bucket/file.ext';

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file