Why am I gettin below exception while reading the data from vertica DB using vertica-spark connector

java.lang.RuntimeException: Type array[varchar is not supported.
at scala.sys.package$.error(package.scala:27)
at com.vertica.spark.seg.SegmentsMetaInfo$$anonfun$getSyntheticSegExpr$2.apply(SegmentsMetaInfo.scala:228)
at com.vertica.spark.seg.SegmentsMetaInfo$$anonfun$getSyntheticSegExpr$2.apply(SegmentsMetaInfo.scala:225)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at com.vertica.spark.seg.SegmentsMetaInfo$class.getSyntheticSegExpr(SegmentsMetaInfo.scala:225)
at com.vertica.spark.datasource.DefaultSource.getSyntheticSegExpr(VerticaSource.scala:15)
at com.vertica.spark.seg.SegmentsMetaInfo$class.initSegInfo(SegmentsMetaInfo.scala:50)
at com.vertica.spark.datasource.DefaultSource.initSegInfo(VerticaSource.scala:15)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:44)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at com.turn.platform.verticaconnectorpoc.BasicReadWriteExamplesConnectorSQLContext$.main(BasicReadWriteExamplesConnectorSQLContext.scala:66)
at com.turn.platform.verticaconnectorpoc.BasicReadWriteExamplesConnectorSQLContext.main(BasicReadWriteExamplesConnectorSQLContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Answers

  • SruthiASruthiA Administrator

    what is the vertica version you are using.. is it a old connector? if so, use latest connector with vertica 11.1.1 and above.

    https://github.com/vertica/spark-connector

  • I am using Vertica Database 23.04 & below jars because of spark 2.3 & scala 2.11 version dependency.

    • vertica-spark2.1_scala2.11.jar
    • vertica-jdbc-11.0.0-0.jar
  • SruthiASruthiA Administrator

    I think you are using old connector. could you please try the new connector from github?

  • I tried using the latest version of the Vertica spark connector (3.3.5) but still got the same error.

  • SruthiASruthiA Administrator

    Could you please share me your code?

  • edited January 6

    Here is the complete code:

    **object BasicReadWriteExamplesConnector {

    def main(args: Array[String]): Unit = {

      val spark = SparkSession.builder()
        .appName("Vertica - Spark Connector Scala Example")
        .enableHiveSupport()
        .master("yarn")
        .config("spark.jars", "/tmp/vertica-spark-3.3.5.jar,/tmp/vertica-jdbc-23.4.0-0.jar")
        .getOrCreate()
    
      spark.sparkContext.setLogLevel("INFO")
    
    val options = Map(
      "host" -> hostname,
      "user" -> username,
      "db" -> dbname,
      "hdfs_url" -> hdfs_url,
      "password" -> password,
      "table" -> tableName,
      "query" -> "select string_column_name from {dbname.tableName} limit 10;"
    )
      val VERTICA_SOURCE = "com.vertica.spark.datasource.DefaultSource"
    
    try {
    
      val dfRead = spark.read.format(VERTICA_SOURCE)
        .options(options)
        .load()
    
      dfRead.show()
    
    } catch {
        case ex: Exception => println(ex.printStackTrace())
    } finally {
      spark.close()
    }
    

    }
    }**

  • SruthiASruthiA Administrator

    I notice that you are still using old default source class. Could you please change it as below

    val VERTICA_SOURCE = "com.vertica.spark.datasource.VerticaSource"

    https://docs.vertica.com/24.4.x/en/spark-integration/migrating-from-legacy-spark-connector/#defaultsource-class-renamed-verticasource

  • When I tried using the "com.vertica.spark.datasource.VerticaSource" I got the below error:

    java.lang.ClassNotFoundException: Failed to find data source: com.vertica.spark.datasource.VerticaSource. Please find packages at http://spark.apache.org/third-party-projects.html
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
    at com.turn.platform.verticaconnectorpoc.BasicReadWriteExamplesConnector$.main(BasicReadWriteExamplesConnector.scala:70)
    at com.turn.platform.verticaconnectorpoc.BasicReadWriteExamplesConnector.main(BasicReadWriteExamplesConnector.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: com.vertica.spark.datasource.VerticaSource.DefaultSource
    at java.lang.ClassLoader.findClass(ClassLoader.java:523)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
    at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
    at scala.util.Try.orElse(Try.scala:84)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)

  • SruthiASruthiA Administrator

    you need to import the below package.. it is newly added

    import com.vertica.spark._

This discussion has been closed.