We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Not able to write to vertica table from spark — Vertica Forum

Not able to write to vertica table from spark

I am trying to write the data to vertica from spark but getting the below error.

y4j.protocol.Py4JJavaError: An error occurred while calling o157.save.
java.lang.Exception: S2V: FATAL ERROR for job S2V_job3607800361496405078. Job status information is available in the Vertica table SBG_PUBLISHED.S2V_JOB_STATUS_USER_SVENKATESH. Unable to create/insert into target table SBG_PUBLISHED.chs_test with SaveMode: Append. ERROR MESSAGE: ERROR: java.sql.SQLException: [Vertica]VJDBC ERROR: Failed to glob [webhdfs://ip-10-68-67-66.us-west-2.compute.internal:8020/tmp/S2V_job3607800361496405078/*.orc] because of error: [http://ip-10-68-67-66.us-west-2.compute.internal:8020/webhdfs/v1/tmp/S2V_job3607800361496405078/?op=LISTSTATUS&user.name=svenkatesh]: Curl Error: Couldn't connect to server
Error Details: Failed to connect to ip-10-68-67-66.us-west-2.compute.internal port 8020: Connection timed out
at com.vertica.spark.s2v.S2V.do2Stage(S2V.scala:339)
at com.vertica.spark.s2v.S2V.save(S2V.scala:389)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:100)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)

spark code

from pyspark.sql import SparkSession
from pyspark.sql.functions import *

spark = SparkSession \
.builder \
.master("local") \
.appName("CHS") \
.enableHiveSupport() \
.getOrCreate()

opts={}
opts['dbschema'] = 'SBG_SOURCE'
opts['db']='idea'
opts['user']=''
opts['password']='
'
opts['host']='prodint.vertica-sbg'

opts['dbschema'] = 'SBG_PUBLISHED'
opts['table'] = 'chs_test'
opts['hdfs_url'] = 'hdfs://ip-10-68-67-66.us-west-2.compute.internal:8020/tmp'
opts['web_hdfs_url'] = 'webhdfs://ip-10-68-67-66.us-west-2.compute.internal:8020/tmp'

df_load=sql("select 'test' as name ")
df_load.write.save(format="com.vertica.spark.datasource.DefaultSource", mode="append", **opts)

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file