Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Copy From webhdfs via Knox using Vertica Spark Connector

ChrisChris
edited July 2019 in General Discussion

Hello,

in preparation of using the Vertica Integration for Spark on Vertica 9.0.1 I'm testing access from Vertica->Knox->HDFS (webhdfs only) using a vertica copy command, inspired by
https://forum.vertica.com/discussion/238808/copy-from-webhdfs-using-knox

copy myVerticaTable from 'webhdfs://hostname:8443/gateway/default/webhdfs/v1/user/u1/my/path/to/my.orc' ON ANY NODE ORC DIRECT;

I get an error message:

[Vertica]VJDBC ERROR: Failed to glob [
webhdfs://hostname:8443/gateway/default/webhdfs/v1/user/u1/my/path/to/my.orc] because of error:
[http://hostname:8443/webhdfs/v1/gateway/default/webhdfs/v1/user/u1/my/path/to/my.orc?user.name=verticaUsername&op=GETFILESTATUS]: Curl Error: Server returned nothing (no headers, no data)
Error Details: Empty reply from server

My questions:

  1. How to get webhdfs resolved to a https address instead of http?
  2. How to fix that seemingly wrong resolved path (two times 'webhdfs/v1')?
  3. How to configure knox credentials (username, password) in the Vertica Integration using Spark opts (see bottom of https://www.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/SparkConnector/WritingtoVerticaUsingDefaultSource.htm?TocPath=Integrating%20with%20Apache%20Spark|Saving%20an%20Apache%20Spark%20DataFrame%20to%20a%20Vertica%20Table|_____1)?
  4. Other advices on how to fix that error?
Tagged:

Comments

  • Bryan_HBryan_H Administrator

    Hi, Knox is not supported by Vertica or its components. If you (or anyone else reading this!) require support for Knox, please open a support case and request Knox support.
    In the meantime, I can think of a few workarounds:
    Write the DataFrame to Vertica directly using JDBC, see a Postgres example at https://stackoverflow.com/questions/38825836/write-spark-dataframe-to-postgres-database
    Write the DataFrame to a supported staging area (NFS, S3, FTP to a Vertica node, etc.) and use JDBC to tell Vertica to COPY from there.
    Adds another moving part, but you could write the DataFrame to a Kafka topic and read from there in Vertica.

  • Thanks Bryan, for the quick reply and alternatives.
    So that actually means that the suggested copy command of your colleague (Jim Knicely) is not valid?
    https://forum.vertica.com/discussion/238808/copy-from-webhdfs-using-knox

  • Bryan_HBryan_H Administrator

    It was suggested, but Knox isn't exactly a proxy. Vertica currently has no way to handle credentials in the Knox model.

  • OK, thanks for the clarification.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.