Cannot load parquet data from S3

Hello,

I am trying to load some parquet file sfrom S3 with the following commands to Vertica 9.1:

ALTER SESSION SET UDPARAMETER FOR awslib aws_id=Y;
ALTER SESSION SET UDPARAMETER FOR awslib aws_secret=X;
ALTER SESSION SET UDPARAMETER FOR awslib aws_region='us-east-1';

TRUNCATE TABLE dev.my_logs;
COPY dev.my_logs
FROM 's3://my_bucket/test/01.parquet'
PARQUET
REJECTMAX 100;

However, I am getting the following error:
[Vertica]VJDBC ERROR: Cannot expand glob pattern due to error: Access Denied

Verified, that the IAM user has no problem accessing the bucket or the file itself with
aws s3 ls s3://my_bucket/test/01.parquet
2019-06-11 12:48:34 153600040 01.parquet

Thanks for any help/recommendations.

Comments

  • Bryan_HBryan_H Vertica Employee Administrator

    Hi, is the bucket encrypted, or maybe just this object? A quick way to check whether this is the issue is to try getting the object with "aws s3api get-object", e.g.
    aws s3api get-object --bucket text-content --key dir/my_images.tar.bz2 my_images.tar.bz2
    Then check for ServerSideEncryption or KMS tags in the output.

  • Bryan_HBryan_H Vertica Employee Administrator

    Also, please check vertica.log on the node you are connected to for detail error messages, likely including the root cause of the Access Denied message that is returned to the client.

  • Hi Bryan,

    I am getting this error in Vertica.log:
    2019-06-11 11:53:25.406 Init Session:7ef9c8bcb700-a00000007e72d2 [Txn] Begin Txn: a00000007e72d2 'COPY dev.my_logs
    FROM 's3://my_bucket/test/01.parquet'
    PARQUET
    REJECTMAX 1000;'
    2019-06-11 11:53:25.431 Init Session:7ef9c8bcb700-a00000007e72d2 @v_a3db_node0001: 22023/7160: Cannot expand glob pattern due to error: Access Denied
    LOCATION: expandGlobLocal, /scratch_a/release/svrtar30992/vbuild/vertica/Optimizer/Path/BulkLoad.cpp:2397

    I have no problem accessing the object from outside using the PARQUET directive.
    For instance, if run this command (not that it make sense), if it was a standard load from S3
    COPY dev.my_logs
    SOURCE S3(bucket='s3://my_bucket/test/01.parquet')
    REJECTMAX 1000;
    I would get the following error:
    [Vertica]VJDBC ERROR: COPY: [1000] records have been rejected
    meaning, Vertica has no problem accessing the file.

  • Bryan_HBryan_H Vertica Employee Administrator

    What exact version of Vertica are you running? Also, can you try this:
    COPY dev.my_logs WITH SOURCE S3(bucket='s3://my_bucket/test/01.parquet') PARQUET REJECTMAX 1000;

  • LenoyJLenoyJ - Select Field - Employee
    edited June 2019

    If you want to configure credentials using vsql and on >9.1, Use:
    ALTER SESSION SET AWSAuth='ID:SECRET';

  • Version: Vertica Analytic Database v9.1.0-1

    Running the command above WITH SOURCE giving me the following error:
    Syntax error at or near "PARQUET"

  • Hi All,

    LenoyJ's comment, to configure credentials solved the issue:
    ALTER SESSION SET AWSAuth='ID:SECRET';

    need to be called, instead of:
    ALTER SESSION SET UDPARAMETER FOR awslib aws_id=ID; ALTER SESSION SET UDPARAMETER FOR awslib aws_secret=SECRET;

    Vertica team, please note, that this is not required for running load from S3 with CSV files, only when using parquet.

    Thanks for your help!

  • dsprogisdsprogis Employee

    Hi Atomix,
    Thank you for taking the time to circle back with confirmation! I have initiated a change to the Documentation per your experience and finding. We really appreciate your commitment to improving usability and, in this case, documentation.
    Best,
    Dave

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file