Unable to write spark DF in Vertica using API
Hi Team
Need answer as getting stuck
Getting below error when trying to save spark DF in vertica,this is blocking my work ,any help will be really appreciated ,couple of things here
a) we have kereberosed cloudera and kerberosed vertica ,all necessary xml files are copied already in vertica clusters as part of setup
b)I can see entries in run table which is created in Vertica when i try to write DF in vertica-does that mean that hadoop vertica connection is established?
Below is the error spark_DF.write.save(format="com.vertica.spark.datasource.DefaultSource", mode="append", opts) 19/12/04 14:52:00 ERROR s2v.S2V: Failed to save DataFrame to Vertica table: est_vertica Traceback (most recent call last): File "", line 1, in File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/pyspark/sql/readwriter.py", line 703, in save self._jwrite.save() File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o85.save.
java.lang.Exception: S2V: FATAL ERROR for job S2V_job2702546586973782764. Job status information is available in the Vertica table S2V_JOB_STATUS_USER. Unable to create/insert into target table FRR.test_vertica with SaveMode: Append. ERROR MESSAGE: ERROR: java.sql.SQLException: [Vertica]VJDBC ERROR: Failed to glob [hdfs:/x/y/tmp/vertica/S2V_job2702546586973782764/*.orc] because of error: Could not connect to [hdfs://x]
Answers
Check the last line: Could not connect to [hdfs://x]
Check the doc page "Verifying HDFS Configuration":
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/HadoopIntegrationGuide/Kerberos/Verifying.htm
@Prakhar84, what became of the checking of the connectivity between Vertica & HDFS?
This is not working ,even Shruthi mentioned in other post that it wont work with kerberosed..it that can work then it will be ideal
SELECT KERBEROS_CONFIG_CHECK();
ok: krb5 exists at [/etc/krb5.conf]
ok: Vertica Keytab file is set to [/opt/vertica/config/vertica2.kt]
ok: Vertica Keytab file exists at [/opt/vertica/config/vertica2.kt]
Kerberos configuration parameters set in the database
KerberosServiceName : [vertica]
KerberosHostname : [db.intlb.org.net]
KerberosRealm : [QA.org.NET]
KerberosKeytabFile : [/opt/vertica/config/vertica2.kt]
Vertica Principal: [vertica/db.intlb.org.net@QA.org.NET]
ok: Vertica can kinit using keytab file
SELECT HDFS_CLUSTER_CONFIG_CHECK();
Validation Success
v_x_node0001: HadoopConfDir [/catalog/x//hadoop-wcc-conf] is valid
v_x_node0002: HadoopConfDir [/catalog/x//hadoop-wcc-conf] is valid
v_x_node0003: HadoopConfDir [/catalog/x//hadoop-wcc-conf] is valid
Shruthi said:we officially dont support kerberos authentication yet with Spark Connector yet.
You might be able to embed additional Kerberos or SSL/TLS parameters into the JDBC URL as follows:
finaldf.write.format("com.vertica.spark.datasource.DefaultSource").options(table="schema.table", db="Analytics?ssl=true", user="vertica",password="****",host="vertica.etlnodes.com" ).mode("append").save()
You should be able to append multiple parameters like a HTTP URL by separating with "&". However, you'll still need to get a Kerberos login on every Spark node, I think.