Vertica-Spark - AWS EMR - Failure due to timeout to get job status using webhdfs

Hi Folks:

I am following the example on AWS EMR using spark-shell .

I am using

  1. spark 2.1.0 ,
  2. vertica-8.1.0_spark2.0_scala2.11.jar
  3. vertica-jdbc-8.1.0-3.jar.

I am getting the error mentioned below.

I've validated that from the EMR master node I can use curl to access the url mentioned in the error.

Please let me know how to resolve this issue.


Curl results:

curl ''
{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"","message":"File /user/test/vertica/S2V_job5301965226302870212/ does not exist."}}

Exception in the spark-shell on saving the dataframe to vertica

17/06/09 21:23:19 ERROR S2V: Failed to save DataFrame to Vertica table: public.S2V_test_table
java.lang.Exception: S2V: FATAL ERROR for job S2V_job5301965226302870212. Job status information is available in the Vertica table public.S2V_JOB_STATUS_USER_DBADMIN. Unable to create/insert into target table public.S2V_test_table with SaveMode: Append. ERROR MESSAGE: ERROR: java.sql.SQLException: [Vertica]VJDBC ERROR: ****Failed to glob "webhdfs://*.orc" because of error: []: Curl Error: Couldn't connect to server
Error Details: Failed to connect to port 50070: Connection timed out****

at com.vertica.spark.s2v.S2V.do2Stage(S2V.scala:342)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:88)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
... 54 elided


  • Hi:

    The issue is resolved and it was connectivity between Vertica and hadoop cluster that was causing the failure.


  • Prakhar84Prakhar84 Vertica Customer

    Hi mans4singh
    How was the issue solved ,any details will be highly appreciated

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file