spark

nabil · March 2016

testing new spark connector and getting this error

Exception in thread "main" java.lang.ClassNotFoundException: com.vertica.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at com.vertica.spark.seg.VUtil$.getConnection(VUtil.scala:125)
at com.vertica.spark.datasource.DefaultSource$$anonfun$6.apply(VerticaSource.scala:37)
at com.vertica.spark.datasource.DefaultSource$$anonfun$6.apply(VerticaSource.scala:37)
at com.vertica.spark.seg.SegmentsMetaInfo$class.initSegInfo(SegmentsMetaInfo.scala:33)
at com.vertica.spark.datasource.DefaultSource.initSegInfo(VerticaSource.scala:15)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:37)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at com.novartis.nibr.benchmark.db.VerticaReadTest$.main(VerticaReadTest.scala:46)
at com.novartis.nibr.benchmark.db.VerticaReadTest.main(VerticaReadTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
bash-3.2$

I am not sure why I would get that. code is

...

import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.{SQLContext}
import com.vertica.spark.datasource.
import com.novartis.nibr.benchmark.util.Properties._
import com.novartis.nibr.benchmark.util.Util.time

object VerticaReadTest {

def main(args: Array[String]) {
val conf = new SparkConf().setAppName("read test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
// read in configuration
    // configuration and jdbc info
    val jdbcProp = readConfig("vertica.properties")

val host = jdbcProp.getString("vertica.host")
val verticadb = jdbcProp.getString("vertica.db")
val port = jdbcProp.getInt("vertica.port")
val dbSchema = "GEDI" // jdbcProp.getString("vertica.schema")
    val user = jdbcProp.getString("vertica.user")
val password = jdbcProp.getString("vertica.pwd")
val tableName = "OBSERVATIONS" // jdbcProp.getString("vertica.tableName")
  
    // OPTIONAL: setup an ip map from Vertica internal IPs to external IPs if needed
    val ipmap: String = jdbcProp.getString("vertica.ipmap")

    // setup the user options, defaults are shown where applicable for optional values.
    val options: Map[String, String] = Map(
"table" -> tableName,
      "db" -> verticadb,
      "user" -> user,
      "password" -> password,
      "host" -> host,
    // "numPartitions"-> "16" // OPTIONAL (default val shown)
    // "tmpdir" -> "/tmp" // OPTIONAL (default val shown)
    // "failed_rows_percent_tolerance"-> "0.00" // OPTIONAL (default val shown)
       "dbschema" -> dbSchema, // OPTIONAL (default val public)
    // "port" -> "5433" // OPTIONAL (default val shown)
       "ipmap" -> ipmap // OPTIONAL (default val shown)
    )

val df = sqlContext.read.format("com.vertica.spark.datasource.DefaultSource").options(options).load()

    time(df.count(), "testing read from vertica")

  }
}

SruthiA · March 2016

Hi,

It looks like you have missed installation of vertica jdbc driver. Please install it and try executing the program

Sruthi

nabil · March 2016

Thanks for the quick reply. I did install

vertica-spark-connector-0.2.0.jar

and built the code with it.

No errors. Why would I need the jdbc driver? No dependencies in the code. Looks like it is needed at runtime.

I added the actually jdbc jar to spark-submit

exec spark-submit \
--master $MASTER \
--class $BENCH_CLASS \
--jars /.../jars/vertica-jdbc-7.2.1-0.jar \
"$BENCH_JAR" \
"$@"

and now I get the error

Exception in thread "main" java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](3737) ERROR: Invalid projection name OBSERVATION_b0
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.dataengine.VResultSet.fetchChunk(Unknown Source)
at com.vertica.dataengine.VResultSet.initialize(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.readExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.handleExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeQuery(Unknown Source)
at com.vertica.spark.seg.SegmentsMetaInfo$class.getSegMap(SegmentsMetaInfo.scala:61)
at com.vertica.spark.datasource.DefaultSource.getSegMap(VerticaSource.scala:15)
at com.vertica.spark.seg.SegmentsMetaInfo$class.initSegInfo(SegmentsMetaInfo.scala:50)
at com.vertica.spark.datasource.DefaultSource.initSegInfo(VerticaSource.scala:15)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:37)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at com.novartis.nibr.benchmark.db.VerticaReadTest$.main(VerticaReadTest.scala:46)
at com.novartis.nibr.benchmark.db.VerticaReadTest.main(VerticaReadTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](3737) ERROR: Invalid projection name OBSERVATION_b0

I verified that the projection exists! and the table OBSERVATION is actually there and I can query it.

nabil · March 2016

I repeated the test with a table in my user connection schema and which had no build projections (other than the default). This is the table created as the write example in the connector manual.

Counting that table of 1 record went fine. I repeated the test pointing to a large observation table in a different schema than the connection schema and this one gave me the same error as above.

SO it seems the problem is associated with schemas....

HD · March 2016

Hi Nabil,

For the "Invalid projection name" error, please refer to my reply in this topic-thread:

https://community.dev.hpe.com/t5/Vertica-Forum/Vertica-Spark-Connector-0-2-0-Invalid-Projection-issue/m-p/235104/highlight/true#M12121

Thanks,

Harshad

spark

Comments

Leave a Comment