Spark/Scala using jdbc to connect to Vertica DB is failling
I am trying to connect from Spark v2.3.1 with Scala 2.11.8 to Vertica DB with the jdbc.
On your webpage: https://www.vertica.com/blog/whats-new-vertica-8-1-connector-apache-spark/
It says that I can download the Spark Connector at the following location: https://my.vertica.com/download/vertica/8-1-x/
It takes me to a login page, after log-in, there is a message on an yellow box that says that I do not have permission to view that download.
I tried to download from your drivers page, under the Linux package there are 3 jar files in there (vertica-javadoc, vertica-jdbc, vertica-jdbc-8.0.1-0), but I cannot find the Vertica-Spark Connector (i.e.: vertica-8.1.0_spark2.0_scala2.11.jar)
Here is my Spark/Scala jdbc script:
val url = "jdbc:vertica//hostname/DBName?username=username&password=pw"
val query = "SELECT * FROM TABLE;"
val df = spark.read.format("jdbc")
.option("driver", "com.vertica.jdbc.Driver")
.option("url", url)
.option("dbtable", query)
.load()
I am using Scala Eclipse IDE, and loaded the 3jar files that come under the Linux download drivers (vertica-javadoc, vertica-jdbc, vertica-jdbc-8.0.1-0)
and I get the error:
Exception in thread "main" java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:70)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:115)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
at att.com.vert2$.main(vert2.scala:37)
at att.com.vert2.main(vert2.scala)
What am I doing wrong? Am I missing the Vertica-Spark Connector?
thank you! Markus.
Answers
Hi Markus,
Can you specify what version of Vertica Server you are using? Vertica ships the Spark connector (vertica_spark2.1_scala2.11.jar) with the server RPM. If you are using an old version of the Vertica Server you can download it from vertica.com downloads
Markus,
From Vertica Server version 9.1, the Spark connector is distributed as part of the RPM. The Spark Connector 2.1 works with Spark 2.2 and 2.3. (This is why we are not distributing a separate jar for Spark 2.2 and 2.3).
You need to place the connector jar in the Spark cluster file system as well as the JDBC driver (FYI, the JDBC driver version 9.1.1 is now backward compatible with old versions of the Vertica Server).
This is the location where the Spark connector jars are located in the RPM:
/opt/vertica/packages/SparkConnector
We distribute two jars:
Here is a list of commands you can use to extract the two Spark connector jars from the 9.1.1 Vertica Community Edition RPM:
Copy rpm to junk dir
Cd to junk dir
[root@localhost junk]# rpm -lqp vertica-9.1.1-0.x86_64.RHEL6.rpm | grep spark
/opt/vertica/packages/SparkConnector/lib/vertica-spark2.0_scala2.11.jar
/opt/vertica/packages/SparkConnector/lib/vertica-spark2.1_scala2.11.jar
[root@localhost junk]# rpm2cpio vertica-9.1.1-0.x86_64.RHEL6.rpm | cpio -idv ./opt/vertica/packages/SparkConnector/lib/vertica-spark2.0_scala2.11.jar
./opt/vertica/packages/SparkConnector/lib/vertica-spark2.0_scala2.11.jar
2459746 blocks
[root@localhost junk]# rpm2cpio vertica-9.1.1-0.x86_64.RHEL6.rpm | cpio -idv ./opt/vertica/packages/SparkConnector/lib/vertica-spark2.1_scala2.11.jar
./opt/vertica/packages/SparkConnector/lib/vertica-spark2.1_scala2.11.jar
2459746 blocks
[root@localhost junk]# ls -l /opt/vertica/packages/SparkConnector/lib
total 592
-rw-r--r--. 1 root root 301786 Jul 22 14:08 vertica-spark2.0_scala2.11.jar
-rw-r--r--. 1 root root 301857 Jul 22 14:08 vertica-spark2.1_scala2.11.jar
Let us know if this works for you.