Vertica Spark Connector throws a java.lang.NullPointerException

mohammed · December 2015

I downloaded the Vertica Spark connector and tried the example shown in the connector guide. When I write a DataFrame to Vertica from the Spark-shell using this statement:

df.write.format("com.vertica.spark.datasource.DefaultSource").options(opts).mode(saveMode).save()

I get the following exception. It looks like I have run into a bug in the Vertica connector code. Does anybody know a work around? Thanks.

15/12/14 22:19:08 ERROR TaskSetManager: Task 2 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 2.0 failed 4 times, most recent failure: Lost task 2.3 in stage 2.0 (TID 93, 10.172.137.138): java.lang.Exception:
Partition[2]: ERROR: Failed while COPYing data to Vertica. partition=2. Error message:java.lang.NullPointerException
        at com.vertica.spark.s2v.S2V$$anonfun$1.apply(S2V.scala:199)
        at com.vertica.spark.s2v.S2V$$anonfun$1.apply(S2V.scala:113)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
        at org.apache.spark.scheduler.Task.run(Task.scala:70)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Jeff-L · December 2015

Hi Mohammed,

This can be fixed by please setting the connector's "tmpdir" option, such as "tmpdir"->"/tmp", or whatever dir you prefer. The userguide indicates that setting the "tmpdir" is optional, but unfortunately we have found this is not the case for all users. We will correct this.

Since our tmpdir relies on spark.local.dir, another way to fix is to set spark.local.dir in your conf/spark-defaults.conf file or set SPARK_LOCAL_DIRS in your spark-env.sh file. To check the current value of spark.local.dir, you can refer to the env tab of the Spark Master web interface or on the command line type:

sqlContext.getConf("spark.local.dir", "this is not set")

Thank you,

Jeff

We're Moving!

Create My New Community Account Now

Vertica Spark Connector throws a java.lang.NullPointerException

Comments

Leave a Comment