The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Not able to write to vertica table from spark

I am trying to write the data to vertica from spark but getting the below error.

y4j.protocol.Py4JJavaError: An error occurred while calling o157.save.
java.lang.Exception: S2V: FATAL ERROR for job S2V_job3607800361496405078. Job status information is available in the Vertica table SBG_PUBLISHED.S2V_JOB_STATUS_USER_SVENKATESH. Unable to create/insert into target table SBG_PUBLISHED.chs_test with SaveMode: Append. ERROR MESSAGE: ERROR: java.sql.SQLException: [Vertica]VJDBC ERROR: Failed to glob [webhdfs://ip-10-68-67-66.us-west-2.compute.internal:8020/tmp/S2V_job3607800361496405078/*.orc] because of error: [http://ip-10-68-67-66.us-west-2.compute.internal:8020/webhdfs/v1/tmp/S2V_job3607800361496405078/?op=LISTSTATUS&user.name=svenkatesh]: Curl Error: Couldn't connect to server
Error Details: Failed to connect to ip-10-68-67-66.us-west-2.compute.internal port 8020: Connection timed out
at com.vertica.spark.s2v.S2V.do2Stage(S2V.scala:339)
at com.vertica.spark.s2v.S2V.save(S2V.scala:389)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:100)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)

spark code

from pyspark.sql import SparkSession
from pyspark.sql.functions import *

spark = SparkSession \
.builder \
.master("local") \
.appName("CHS") \
.enableHiveSupport() \
.getOrCreate()

opts={}
opts['dbschema'] = 'SBG_SOURCE'
opts['db']='idea'
opts['user']=''
opts['password']='
'
opts['host']='prodint.vertica-sbg'

opts['dbschema'] = 'SBG_PUBLISHED'
opts['table'] = 'chs_test'
opts['hdfs_url'] = 'hdfs://ip-10-68-67-66.us-west-2.compute.internal:8020/tmp'
opts['web_hdfs_url'] = 'webhdfs://ip-10-68-67-66.us-west-2.compute.internal:8020/tmp'

df_load=sql("select 'test' as name ")
df_load.write.save(format="com.vertica.spark.datasource.DefaultSource", mode="append", **opts)

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file