Options

Unable to write spark DF in Vertica using API

2»

Answers

  • Options
    Prakhar84Prakhar84 Vertica Customer

    Hi Lenoy,
    Thanks for the detailed exaplanation but will need your help to implement this in pyspark code ...searched on google but could not find more
    based on below link I tried
    https://stackoverflow.com/questions/51731998/how-to-add-custom-jdbc-dialects-in-pyspark
    Are you saying
    a) save the verticadialect.scala shared earlier in a location and the call pyspark like below?
    pyspark2 --jars /home/x/vertica-9.0.1_spark2.1_scala2.11.jar,/home/x/vertica-jdbc-9.2.0-0.jar,/home/x/VerticaDialect.scala
    Tried below but getting error
    from py4j.java_gateway import java_import
    gw = spark.sparkContext._gateway
    java_import(gw.jvm, "com.me.VerticaDialect")
    gw.jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(gw.jvm.com.me.VerticaDialect())
    but i get an error
    Traceback (most recent call last):
    File "", line 1, in
    TypeError: 'JavaPackage' object is not callable

    Please help us to implement in pyspark as well so that it will be helful for clients using pyspark not scala.

  • Options
    Bryan_HBryan_H Vertica Employee Administrator

    Hi, I believe you would need to compile the scala file into a JAR and add to the classpath. We will run some tests to determine the best approach but it may be fairly complex to set up.

  • Options
    Prakhar84Prakhar84 Vertica Customer

    Hi Bryan
    It will be very beneficial for clients who use pyspark for pushing data in vertica to have this setup properly,if you can guide us step by step to work on my code then others can also leverage the same.
    Is there a way that dialect can be directly written for pyspark? (just asking)
    Also what will be the best way to integrate this with spark rather then registering dialect everytime
    Prakhar

  • Options
    LenoyJLenoyJ - Select Field - Employee

    @Prakhar84, quick tangent - I'm curious, what became of the checking of the connectivity between Vertica & HDFS? Most of my customers just get the Spark Connector working which usually solves most issues. I believe you raised another discussion here. Let's continue that discussion there.

  • Options
    Prakhar84Prakhar84 Vertica Customer

    @Bryan_H

    Please let us know of any solution for this dialect in pyspark .
    Thanks for your guidance so far.

    Prakhar

  • Options
    Bryan_HBryan_H Vertica Employee Administrator

    Hi, a workaround we have suggested for other customers is to write Parquet files to a temporary folder that Vertica can read, then use vertica-python driver to issue a COPY command to import the Parquet file (see http://github.com/vertica/vertica-python for details)
    We are investigating better solutions; however, and in my own opinion, we should implement a complete VerticaDialect and commit to Apache Spark to fix this for all Sprk programs whether Java, Scala, and PySpark. It will take a long time to develop this and then a long time for Apache to accept and publish as part of next Spark though.

  • Options
    LenoyJLenoyJ - Select Field - Employee

    I'd also recommend checking with the Spark community on how to use JDBC dialects in PySpark.

  • Options
    Bryan_HBryan_H Vertica Employee Administrator

    Hi, you can add compiled classes to the classpath. However, you would need to rebuild Spark from a GitHub checkout after applying the following patch:
    https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
    This would embed the VerticaDialect into the Spark build. To avoid having to reinstall the custom Spark build, you can extract the compiled VerticaDialect class and apply that at runtime as part of the classpath.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file