Load data from spark dataframe to vertica from AWS Glue

pg_operativepg_operative Registered User

We are using Vertica version 9.2.1. AWS Glue as ETL tool. Trying to load the data from pyspark data frame to Vertica. Getting below error

An error occurred while calling o1907.save.
java.lang.Exception: ERROR: S2V.save(): did not pass the Vertica requirements pre-check. The following problems were encountered: hdfs_url scheme should be 'hdfs', but user provided:null. hdfs_url path is not valid, user provided:. java.lang.IllegalArgumentException: Can not create a Path from an empty string
at com.vertica.spark.s2v.S2V.save(S2V.scala:491)

--data loading step
dataFrame4.write.save(format="com.vertica.spark.datasource.DefaultSource", mode="append", **opts)

opts={}
opts['dbschema'] = 'staging'
opts['table'] = 'fact_rating_aggregate_stage'
opts['db']='*'
opts['user']='etluser'
opts['password']='****'
opts['host']='***'

Glue service is serverless and brings servers on the fly and processes data. I have not given the hdfs_url as this is keep changing.

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file