Copy From webhdfs via Knox using Vertica Spark Connector
Hello,
in preparation of using the Vertica Integration for Spark on Vertica 9.0.1 I'm testing access from Vertica->Knox->HDFS (webhdfs only) using a vertica copy command, inspired by
https://forum.vertica.com/discussion/238808/copy-from-webhdfs-using-knox
copy myVerticaTable from 'webhdfs://hostname:8443/gateway/default/webhdfs/v1/user/u1/my/path/to/my.orc' ON ANY NODE ORC DIRECT;
I get an error message:
[Vertica]VJDBC ERROR: Failed to glob [
webhdfs://hostname:8443/gateway/default/webhdfs/v1/user/u1/my/path/to/my.orc] because of error:
[http://hostname:8443/webhdfs/v1/gateway/default/webhdfs/v1/user/u1/my/path/to/my.orc?user.name=verticaUsername&op=GETFILESTATUS]: Curl Error: Server returned nothing (no headers, no data)
Error Details: Empty reply from server
My questions:
- How to get webhdfs resolved to a https address instead of http?
- How to fix that seemingly wrong resolved path (two times 'webhdfs/v1')?
- How to configure knox credentials (username, password) in the Vertica Integration using Spark opts (see bottom of https://www.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/SparkConnector/WritingtoVerticaUsingDefaultSource.htm?TocPath=Integrating%20with%20Apache%20Spark|Saving%20an%20Apache%20Spark%20DataFrame%20to%20a%20Vertica%20Table|_____1)?
- Other advices on how to fix that error?
Comments
Hi, Knox is not supported by Vertica or its components. If you (or anyone else reading this!) require support for Knox, please open a support case and request Knox support.
In the meantime, I can think of a few workarounds:
Write the DataFrame to Vertica directly using JDBC, see a Postgres example at https://stackoverflow.com/questions/38825836/write-spark-dataframe-to-postgres-database
Write the DataFrame to a supported staging area (NFS, S3, FTP to a Vertica node, etc.) and use JDBC to tell Vertica to COPY from there.
Adds another moving part, but you could write the DataFrame to a Kafka topic and read from there in Vertica.
Thanks Bryan, for the quick reply and alternatives.
So that actually means that the suggested copy command of your colleague (Jim Knicely) is not valid?
https://forum.vertica.com/discussion/238808/copy-from-webhdfs-using-knox
It was suggested, but Knox isn't exactly a proxy. Vertica currently has no way to handle credentials in the Knox model.
OK, thanks for the clarification.