The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Vertica Spark Connector throws a java.lang.NullPointerException

I downloaded the Vertica Spark connector and tried the example shown in the connector guide. When I write a DataFrame to Vertica from the Spark-shell using this statement:




I get the following exception. It looks like I have run into a bug in the Vertica connector code. Does anybody know a work around? Thanks.


15/12/14 22:19:08 ERROR TaskSetManager: Task 2 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 2.0 failed 4 times, most recent failure: Lost task 2.3 in stage 2.0 (TID 93, java.lang.Exception:
Partition[2]: ERROR: Failed while COPYing  data to Vertica.  partition=2. Error message:java.lang.NullPointerException
        at com.vertica.spark.s2v.S2V$$anonfun$1.apply(S2V.scala:199)
        at com.vertica.spark.s2v.S2V$$anonfun$1.apply(S2V.scala:113)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
        at org.apache.spark.executor.Executor$
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$


  • Hi Mohammed, 

    This can be fixed by please setting the connector's "tmpdir" option, such as "tmpdir"->"/tmp", or whatever dir you prefer.  The userguide indicates that setting the "tmpdir" is optional, but unfortunately we have found this is not the case for all users.  We will correct this.  


    Since our tmpdir relies on spark.local.dir, another way to fix is to set spark.local.dir in your conf/spark-defaults.conf file or set SPARK_LOCAL_DIRS in your file.  To check the current value of spark.local.dir, you can refer to the env tab of the Spark Master web interface or on the command line type: 

    sqlContext.getConf("spark.local.dir", "this is not set")


    Thank you,


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file