Loading spark dataframe into vertica
http://www.sparkexpert.com/2015/04/17/save-apache-spark-dataframe-to-database/
Hi i tried to load dataframes using the above link into mysql it worked. But when i tried to load it into vertica database this is the error i am facing
Exception in thread “main” java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](5108) ERROR: Type “TEXT” does not exist
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.io.ProtocolStream.readExpectedMessage(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepareImpl(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.(Unknown Source)
at com.vertica.jdbc.jdbc4.S4PreparedStatement.(Unknown Source)
at com.vertica.jdbc.VerticaJdbc4PreparedStatementImpl.(Unknown Source)
at com.vertica.jdbc.VJDBCObjectFactory.createPreparedStatement(Unknown Source)
at com.vertica.jdbc.common.SConnection.prepareStatement(Unknown Source)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:275)
at org.apache.spark.sql.DataFrame.createJDBCTable(DataFrame.scala:1611)
at com.sparkread.SparkVertica.JdbctoVertica.main(JdbctoVertica.java:51)
Caused by: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](5108) ERROR: Type “TEXT” does not exist
… 13 more
This error is because vertica db doesn’t support the datatypes(TEXT) which is in the dataframes(parquet file). I do not wanted to type cast the columns since its going to be a performance issue. we are looking to load around 280 million rows. Could you please suggest the best way to load the data into vertica db.
Comments
We are planning a beta release Spark to Vertica connector that could handle your scenario. Send me an email sunil.venkayala@hpe.com, I will notify you when this connector available for download.
Thanks
Sunil
I am running into the same issue. Is there any updates on this issue?
Hi you can use this following spark-vertica connector for the above issue
https://saas.hpe.com/marketplace/big-data/hpe-vertica-connector-apache-spark
Hi Sunil,
I am getting the following exception while saving Spark DataFrame to Vertica database.
Can you help me out?
Exception in thread "main" java.sql.SQLException: [Vertica][VJDBC](5108) ERROR: Type "TEXT" does not exist
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.executeSimpleProtocol(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.SStatement.executeNoParams(Unknown Source)
at com.vertica.jdbc.SStatement.executeUpdate(Unknown Source)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:302)
at com.hp.spark.ReturnVisitorImportScoreLRFinalOld.main(ReturnVisitorImportScoreLRFinalOld.java:78)
Thanks,
Raj
Hello,
I am running into the same issue. Are there any news on this one?
Thanks a lot!
Ira
Hello,
When the Vertica table exists with the same column names as the dataFrame (and the corresponding types) the following has worked for me:
Cheers,
Ira