Cannot load parquet data from S3

atomix · June 2019

Hello,

I am trying to load some parquet file sfrom S3 with the following commands to Vertica 9.1:

ALTER SESSION SET UDPARAMETER FOR awslib aws_id=Y;
ALTER SESSION SET UDPARAMETER FOR awslib aws_secret=X;
ALTER SESSION SET UDPARAMETER FOR awslib aws_region='us-east-1';

TRUNCATE TABLE dev.my_logs;
COPY dev.my_logs
FROM 's3://my_bucket/test/01.parquet'
PARQUET
REJECTMAX 100;

However, I am getting the following error:
[Vertica]VJDBC ERROR: Cannot expand glob pattern due to error: Access Denied

Verified, that the IAM user has no problem accessing the bucket or the file itself with
aws s3 ls s3://my_bucket/test/01.parquet
2019-06-11 12:48:34 153600040 01.parquet

Thanks for any help/recommendations.

Bryan_H · June 2019

Hi, is the bucket encrypted, or maybe just this object? A quick way to check whether this is the issue is to try getting the object with "aws s3api get-object", e.g.
aws s3api get-object --bucket text-content --key dir/my_images.tar.bz2 my_images.tar.bz2
Then check for ServerSideEncryption or KMS tags in the output.

Bryan_H · June 2019

Also, please check vertica.log on the node you are connected to for detail error messages, likely including the root cause of the Access Denied message that is returned to the client.

atomix · June 2019

Hi Bryan,

I am getting this error in Vertica.log:
2019-06-11 11:53:25.406 Init Session:7ef9c8bcb700-a00000007e72d2 [Txn] Begin Txn: a00000007e72d2 'COPY dev.my_logs
FROM 's3://my_bucket/test/01.parquet'
PARQUET
REJECTMAX 1000;'
2019-06-11 11:53:25.431 Init Session:7ef9c8bcb700-a00000007e72d2 @v_a3db_node0001: 22023/7160: Cannot expand glob pattern due to error: Access Denied
LOCATION: expandGlobLocal, /scratch_a/release/svrtar30992/vbuild/vertica/Optimizer/Path/BulkLoad.cpp:2397

I have no problem accessing the object from outside using the PARQUET directive.
For instance, if run this command (not that it make sense), if it was a standard load from S3
COPY dev.my_logs
SOURCE S3(bucket='s3://my_bucket/test/01.parquet')
REJECTMAX 1000;
I would get the following error:
[Vertica]VJDBC ERROR: COPY: [1000] records have been rejected
meaning, Vertica has no problem accessing the file.

Bryan_H · June 2019

What exact version of Vertica are you running? Also, can you try this:
COPY dev.my_logs WITH SOURCE S3(bucket='s3://my_bucket/test/01.parquet') PARQUET REJECTMAX 1000;

LenoyJ · June 2019

If you want to configure credentials using vsql and on >9.1, Use:
ALTER SESSION SET AWSAuth='ID:SECRET';

atomix · June 2019

Version: Vertica Analytic Database v9.1.0-1

Running the command above WITH SOURCE giving me the following error:
Syntax error at or near "PARQUET"

atomix · June 2019

Hi All,

LenoyJ's comment, to configure credentials solved the issue:
ALTER SESSION SET AWSAuth='ID:SECRET';

need to be called, instead of:
ALTER SESSION SET UDPARAMETER FOR awslib aws_id=ID; ALTER SESSION SET UDPARAMETER FOR awslib aws_secret=SECRET;

Vertica team, please note, that this is not required for running load from S3 with CSV files, only when using parquet.

Thanks for your help!

dsprogis · June 2019

Hi Atomix,
Thank you for taking the time to circle back with confirmation! I have initiated a change to the Documentation per your experience and finding. We really appreciate your commitment to improving usability and, in this case, documentation.
Best,
Dave

We're Moving!

Create My New Community Account Now

Cannot load parquet data from S3

Comments

Leave a Comment