spark
testing new spark connector and getting this error
Exception in thread "main" java.lang.ClassNotFoundException: com.vertica.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at com.vertica.spark.seg.VUtil$.getConnection(VUtil.scala:125)
at com.vertica.spark.datasource.DefaultSource$$anonfun$6.apply(VerticaSource.scala:37)
at com.vertica.spark.datasource.DefaultSource$$anonfun$6.apply(VerticaSource.scala:37)
at com.vertica.spark.seg.SegmentsMetaInfo$class.initSegInfo(SegmentsMetaInfo.scala:33)
at com.vertica.spark.datasource.DefaultSource.initSegInfo(VerticaSource.scala:15)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:37)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at com.novartis.nibr.benchmark.db.VerticaReadTest$.main(VerticaReadTest.scala:46)
at com.novartis.nibr.benchmark.db.VerticaReadTest.main(VerticaReadTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
bash-3.2$
I am not sure why I would get that. code is
...
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.{SQLContext}
import com.vertica.spark.datasource.
import com.novartis.nibr.benchmark.util.Properties._
import com.novartis.nibr.benchmark.util.Util.time
object VerticaReadTest {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("read test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
// read in configuration
// configuration and jdbc info
val jdbcProp = readConfig("vertica.properties")
val host = jdbcProp.getString("vertica.host")
val verticadb = jdbcProp.getString("vertica.db")
val port = jdbcProp.getInt("vertica.port")
val dbSchema = "GEDI" // jdbcProp.getString("vertica.schema")
val user = jdbcProp.getString("vertica.user")
val password = jdbcProp.getString("vertica.pwd")
val tableName = "OBSERVATIONS" // jdbcProp.getString("vertica.tableName")
// OPTIONAL: setup an ip map from Vertica internal IPs to external IPs if needed
val ipmap: String = jdbcProp.getString("vertica.ipmap")
// setup the user options, defaults are shown where applicable for optional values.
val options: Map[String, String] = Map(
"table" -> tableName,
"db" -> verticadb,
"user" -> user,
"password" -> password,
"host" -> host,
// "numPartitions"-> "16" // OPTIONAL (default val shown)
// "tmpdir" -> "/tmp" // OPTIONAL (default val shown)
// "failed_rows_percent_tolerance"-> "0.00" // OPTIONAL (default val shown)
"dbschema" -> dbSchema, // OPTIONAL (default val public)
// "port" -> "5433" // OPTIONAL (default val shown)
"ipmap" -> ipmap // OPTIONAL (default val shown)
)
val df = sqlContext.read.format("com.vertica.spark.datasource.DefaultSource").options(options).load()
time(df.count(), "testing read from vertica")
}
}
Comments
Hi,
It looks like you have missed installation of vertica jdbc driver. Please install it and try executing the program
Sruthi
Thanks for the quick reply. I did install
vertica-spark-connector-0.2.0.jar
and built the code with it.
No errors. Why would I need the jdbc driver? No dependencies in the code. Looks like it is needed at runtime.
I added the actually jdbc jar to spark-submit
exec spark-submit \
--master $MASTER \
--class $BENCH_CLASS \
--jars /.../jars/vertica-jdbc-7.2.1-0.jar \
"$BENCH_JAR" \
"$@"
and now I get the error
Exception in thread "main" java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](3737) ERROR: Invalid projection name OBSERVATION_b0
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.dataengine.VResultSet.fetchChunk(Unknown Source)
at com.vertica.dataengine.VResultSet.initialize(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.readExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.handleExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeQuery(Unknown Source)
at com.vertica.spark.seg.SegmentsMetaInfo$class.getSegMap(SegmentsMetaInfo.scala:61)
at com.vertica.spark.datasource.DefaultSource.getSegMap(VerticaSource.scala:15)
at com.vertica.spark.seg.SegmentsMetaInfo$class.initSegInfo(SegmentsMetaInfo.scala:50)
at com.vertica.spark.datasource.DefaultSource.initSegInfo(VerticaSource.scala:15)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:37)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at com.novartis.nibr.benchmark.db.VerticaReadTest$.main(VerticaReadTest.scala:46)
at com.novartis.nibr.benchmark.db.VerticaReadTest.main(VerticaReadTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](3737) ERROR: Invalid projection name OBSERVATION_b0
I verified that the projection exists! and the table OBSERVATION is actually there and I can query it.
I repeated the test with a table in my user connection schema and which had no build projections (other than the default). This is the table created as the write example in the connector manual.
Counting that table of 1 record went fine. I repeated the test pointing to a large observation table in a different schema than the connection schema and this one gave me the same error as above.
SO it seems the problem is associated with schemas....
Hi Nabil,
For the "Invalid projection name" error, please refer to my reply in this topic-thread:
https://community.dev.hpe.com/t5/Vertica-Forum/Vertica-Spark-Connector-0-2-0-Invalid-Projection-issue/m-p/235104/highlight/true#M12121
Thanks,
Harshad