Does Vertica 9.3.1 support connectivity to a Hadoop Cluster which is Knox enabled?

prativanayakprativanayak Vertica Customer
edited April 2021 in General Discussion

I am trying to establish connectivity between Vertica and a Hadoop cluster which is Knox enabled. As per the documentation have placed core-site.xml and hdfs-site.xml in a location and it shows up while executing VERIFY_HADOOP_CONF_DIR.
While trying to execute other Hadoop function like HDFS_CLUSTER_CONFIG_CHECK(), EXTERNAL_CONFIG_CHECK etc. it fails.
Tried parquet export - select EXPORT TO PARQUET(directory='hdfs:///data/dir_name') .

It fails with below error.

dbadmin=> EXPORT TO PARQUET(directory='hdfs:///data/dir_name/parquet-test')
dbadmin-> AS select dataflow_id, dataflow_load_date from schemaname.tabname limit 5;
NOTICE 8194: HDFS cluster [hdfs://xxxx/] has wire encryption enabled. Falling back to swebhdfs
HINT: You can continue to use 'hdfs://' in your queries, but they are actually using 'swebhdfs://'
ERROR 8198: Unable to verify if directory [hdfs:///data/dir_name/parquet-test/] exists due to 'Error listing directory [hdfs:///data/dir_name/parquet-test] [https://xxx.xxxx.net:xxxx/webhdfs/v1/data/dir_name/parquet-test?op=LISTSTATUS&user.name=dbadmin]: Curl Error: Couldn't connect to server
Error Details: Failed to connect to xx-xxxx-xxx.xxxx.net port xxxx: Connection refused
[https://xxxxx-xxxxxxx.xxxx.net:xxxx/webhdfs/v1/data/dir_name/parquet-test?op=LISTSTATUS&user.name=dbadmin]: Curl Error: Couldn't connect to server

Error Details: Failed to connect to xxxxx-xxxxxxx.xxxx.net port xxxxx: Connection refused

The URL formed to do the export is actually using namenode address not knox address due to which it is failing. As knox is the entry point, URL should be using knox address instead of namenode address directly.
Didn't find any documentation around if anything specific to be done in a knox enabled system.
Does someone has any idea how to get around this?
Thanks in advance.

Best Answer

  • Options
    SruthiASruthiA Vertica Employee Administrator
    Answer ✓

    @prativanayak : We dont support knox officially yet. Please open a support case so that we can debug further.


  • Options
    SruthiASruthiA Vertica Employee Administrator
    edited March 2021

    @prativanayak You will receive that error message when you did not enable webhdfs in hadoop cluster. what is the value of dfs.webhdfs.enabled property set to in HDFS cluster? Vertica does form the URL using namenode address. Can you share me more regarding knox?

  • Options
    prativanayakprativanayak Vertica Customer
    edited April 2021

    Thanks for your response Sruthi.
    Yes webhdfs is enabled in the Hadoop cluster.
    Snippet below from hdfs-site.xml ----


    Additionally the curl command to to read/write data in HDFS is going through but the URL to do the same looks something like https://knox.xxxx.xxxxxxxx.net:xxxx/webhdfs/v1/...

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file