Vertica Spark Connector - issues with java.sql.Timestamp and Date

Hi,

 

I've just tried using the Vertica Spark connector as it seemed promising on papper.

However it is having issues saving data to avro files when the column is of type java.sql.Date or java.sql.Timestamp. 

This is how Spark internally keeps the date or timestamp values inside a dataframe.

 

Here is a quick way to test this:

import java.sql.Timestamp
import org.apache.spark.sql.SaveMode
import sqlContext.implicits._

case class Person(name:String, age:Int, height:Float, lastSeen: java.sql.Timestamp)
val lastSeen = new Timestamp(1454413911000L)
val persons = List(Person("Adidas", 23, 1.72f, lastSeen))
val df = sqlContext.createDataFrame(persons)


val opts = Map(
"db" -> db,
"user" -> user,
"password" -> pwd,
"host" -> host,
"tmpdir" -> "/tmp",
"numPartitions" -> "4",
"dbschema" -> dbSchema)


df.write
.format("com.vertica.spark.datasource.DefaultSource")
.options(opts + ("table" -> "test_spark_connector"))
.mode(SaveMode.Overwrite)
.save()

The following error is being thrown:

Partition[3]: ERROR: Failed while COPYing  data to Vertica.  partition=3. Error message:java.lang.Exception: ERROR: S2VSaver.writeToAvroFile(): Failed writing to local avro file at path:/tmp/spark-vertica-connector-tmpfile-3492120993874597034-part-3.avro.  ERROR:org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.AvroRuntimeException: Unknown datum type java.sql.Timestamp: 2016-02-02 11:51:51.0
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:210)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:124)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Am I missing something?

Comments

  • Hi,

     

    Apologies for the inconvenience.

     

    This is a known issue with the current version of the connector that is available. However, in the upcoming release this issue will be fixed, as Java Dates and Timestamps will be properly converted before Avro writing.

     

    Thank you,

    Edward

  • Hi Edward,

    Thanks for the reply, good to know about this.
    What would be a good place to follow in order to receive updates about the Vertica Spark connector?
  • Hi,

     

    For now, we will be using our mailing list (of which you must be a part) to provide updates about the connector.

     

    Thanks,

    Edward

  • Check this one...Convert........Timestamp to Date

    Eldo

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file