The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
vertica spark connector
I am currently playing arround with vertica 7.2 and I wanted to access some data via Spark.
There where some announcements that vertica should work with spark and I've seen several posts saying that there's a vertica-spark connector available here:
but apperently the connector has moved. I' ve only found the hadoop connector so far.
Any pointers to the spark connector would be great!
Thanks in advance!
From what I have heard, the spark connector isn't available right now since it's undergoing some bug fixes/performance improvement.
thank you a lot for the reply! I see, but isn't there the old version available somewhere?
For now I didn't have any luck googling for it.
Thanks a lot!
Hi Ira and others, are there any updates on the expected re-release of the connector, or previous versions? I see some recent activity on https://community.dev.hpe.com/t5/Vertica-Forum/Loading-spark-dataframe-into-vertica/td-p/233482.
Any update on this? Even if there is unstable version?
Hi Sam and Schneider315,
For now I have used the SparkSql module and jdbc to read and write to Vertica.
As you've probably already seen here https://community.dev.hpe.com/t5/Vertica-Forum/Loading-spark-dataframe-into-vertica/td-p/233482 .
No news on the connector on my side - I'd be glad to get an update on that too.
I tried your solution it worked but only for dataframe that doesnt contain NULL in any of the rows. It errors out when there is a null present. Did you face the same issue? Do you hve any solutions in mind?
How are you connecting sparksql and Vertica with jdbc? What kind of jdbc driver are you using?
For 8.0 there is a connector available for download on my.vertica.com downloads.
Under the "Connector section" towards the bottom of the page.
And the documentation is here:
HPE Product Management
you can get the jdbc driver from the vertica download site (https://my.vertica.com/download/vertica/client-drivers/) and then you can acces your data like this:
You have to make sure that the jdbc jar is available to spark, either package it in your application jar or provide it via the --jars option.
Concerning the null issue during the writing of dataframes I started playing with a custom sql dialect but I think the connector mentioned by Mark should help.
Haven't tried it yet, but it seems that it supports only Spark 1.6.