We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Vertica-Spark - AWS EMR - Failure due to timeout to get job status using webhdfs — Vertica Forum

Vertica-Spark - AWS EMR - Failure due to timeout to get job status using webhdfs

Hi Folks:

I am following the example https://my.vertica.com/get-started-vertica/integrating-apache-spark/ on AWS EMR using spark-shell .

I am using

  1. spark 2.1.0 ,
  2. vertica-8.1.0_spark2.0_scala2.11.jar
  3. vertica-jdbc-8.1.0-3.jar.

I am getting the error mentioned below.

I've validated that from the EMR master node I can use curl to access the url mentioned in the error.

Please let me know how to resolve this issue.

Thanks

Curl results:

curl 'http://xxx-xx-x-xxx.us-west-2.compute.internal:50070/webhdfs/v1/user/test/vertica/S2V_job5301965226302870212/?user.name=dbadmin&op=LISTSTATUS'
{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File /user/test/vertica/S2V_job5301965226302870212/ does not exist."}}

Exception in the spark-shell on saving the dataframe to vertica

17/06/09 21:23:19 ERROR S2V: Failed to save DataFrame to Vertica table: public.S2V_test_table
java.lang.Exception: S2V: FATAL ERROR for job S2V_job5301965226302870212. Job status information is available in the Vertica table public.S2V_JOB_STATUS_USER_DBADMIN. Unable to create/insert into target table public.S2V_test_table with SaveMode: Append. ERROR MESSAGE: ERROR: java.sql.SQLException: [Vertica]VJDBC ERROR: ****Failed to glob "webhdfs://xxx-xx-x-xxx.us-west-2.compute.internal:50070/user/test/vertica/S2V_job5301965226302870212/*.orc" because of error: [http://ip-xxx-xx-x-xxx.us-west-2.compute.internal:50070/webhdfs/v1/user/test/vertica/S2V_job5301965226302870212/?user.name=dbadmin&op=LISTSTATUS]: Curl Error: Couldn't connect to server
Error Details: Failed to connect to ip-xxx-xx-x-xxx.us-west-2.compute.internal port 50070: Connection timed out****

at com.vertica.spark.s2v.S2V.do2Stage(S2V.scala:342)
at com.vertica.spark.s2v.S2V.save(S2V.scala:392)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:88)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
... 54 elided

Comments

  • Hi:

    The issue is resolved and it was connectivity between Vertica and hadoop cluster that was causing the failure.

    Thanks

  • Prakhar84Prakhar84 Vertica Customer

    Hi mans4singh
    How was the issue solved ,any details will be highly appreciated
    P

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file