Unable to write spark DF in Vertica using API

Prakhar84Prakhar84 Vertica Customer

Hi Team
Need answer as getting stuck
Getting below error when trying to save spark DF in vertica,this is blocking my work ,any help will be really appreciated ,couple of things here
a) we have kereberosed cloudera and kerberosed vertica ,all necessary xml files are copied already in vertica clusters as part of setup
b)I can see entries in run table which is created in Vertica when i try to write DF in vertica-does that mean that hadoop vertica connection is established?

Below is the error spark_DF.write.save(format="com.vertica.spark.datasource.DefaultSource", mode="append", opts) 19/12/04 14:52:00 ERROR s2v.S2V: Failed to save DataFrame to Vertica table: est_vertica Traceback (most recent call last): File "", line 1, in File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/pyspark/sql/readwriter.py", line 703, in save self._jwrite.save() File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o85.save.
java.lang.Exception: S2V: FATAL ERROR for job S2V_job2702546586973782764. Job status information is available in the Vertica table S2V_JOB_STATUS_USER. Unable to create/insert into target table FRR.test_vertica with SaveMode: Append. ERROR MESSAGE: ERROR: java.sql.SQLException: [Vertica]VJDBC ERROR: Failed to glob [hdfs:/x/y/tmp/vertica/S2V_job2702546586973782764/*.orc] because of error: Could not connect to [hdfs://x]



  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2019

    Check the last line: Could not connect to [hdfs://x]

    Check the doc page "Verifying HDFS Configuration":

  • LenoyJLenoyJ - Select Field - Employee

    @Prakhar84, what became of the checking of the connectivity between Vertica & HDFS?

  • Prakhar84Prakhar84 Vertica Customer

    This is not working ,even Shruthi mentioned in other post that it wont work with kerberosed..it that can work then it will be ideal

    ok: krb5 exists at [/etc/krb5.conf]
    ok: Vertica Keytab file is set to [/opt/vertica/config/vertica2.kt]
    ok: Vertica Keytab file exists at [/opt/vertica/config/vertica2.kt]
    Kerberos configuration parameters set in the database
    KerberosServiceName : [vertica]
    KerberosHostname : [db.intlb.org.net]
    KerberosRealm : [QA.org.NET]
    KerberosKeytabFile : [/opt/vertica/config/vertica2.kt]
    Vertica Principal: [vertica/db.intlb.org.net@QA.org.NET]
    ok: Vertica can kinit using keytab file

    Validation Success
    v_x_node0001: HadoopConfDir [/catalog/x//hadoop-wcc-conf] is valid
    v_x_node0002: HadoopConfDir [/catalog/x//hadoop-wcc-conf] is valid
    v_x_node0003: HadoopConfDir [/catalog/x//hadoop-wcc-conf] is valid
    Shruthi said:we officially dont support kerberos authentication yet with Spark Connector yet.

  • Bryan_HBryan_H Vertica Employee Administrator

    You might be able to embed additional Kerberos or SSL/TLS parameters into the JDBC URL as follows:
    finaldf.write.format("com.vertica.spark.datasource.DefaultSource").options(table="schema.table", db="Analytics?ssl=true", user="vertica",password="****",host="vertica.etlnodes.com" ).mode("append").save()
    You should be able to append multiple parameters like a HTTP URL by separating with "&". However, you'll still need to get a Kerberos login on every Spark node, I think.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file