Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Vertica Integration with Hadoop

Hi Expertise,
Currently I am facing issue with Vertica Integration with hadoop.
We have requirement to get data from hadoop(HDFS ).
Is there any way to export data from there ?
Can provide simple steps to configure in Vertica 9.1 Enterprise ?
Also we have to sync up data with HDFS ,whatever the data is updated in HDFS same changes updated in vertica as well.

Thanks in advance !!



Best Answers


  • Could you please run sample CURL test to see if we can access hadoop cluster? Please replace the filename, path and namenode value accordingly.

    curl -i -L  http://hadoopNameNode:50070/webhdfs/v1/tmp/test.txt?op=OPEN

  • Hi Sruthi,
    I unable to open the link you provided.
    Do I need to set passwordless connection between Vertica Node to Hadoop cluster ?
    It will be helpful if you provide steps to configure vertica with hadoop.


  • SruthiASruthiA Employee
    edited September 2019

    Hi Mujeef,

    It is not a link. please replace Namenode , path and file name in the below CURL command and run it on vertica node and share me the output.

    curl -i -L http://hadoopNameNode:50070/webhdfs/v1/tmp/test.txt?op=OPEN

  • Hi Sruthi,
    We have CDH 6.1.1 , can we integrated with Vertica 9.1 or 9.2 ?

  • Hi Mujeef,

    The latest we support and officially tested is 6.0 on 9.2. You can integrate it with 6.1.1 and it should work in general. if you face any issues, please open a support case


  • Hi Shruti,
    Thanks for your quick response.
    Just here one question
    I need to real time synchronization with Hadoop to Vertica.
    I mean Whatever the changes don on HDFS same will update on Vertica database as well.
    Is there a way we can archive this ?
    Mujeef Shaikh

  • Hello Mujeef,
    Using similar steps to those described above, in order to expose the HDFS data to Vertica
    ie: CREATE EXTERNAL TABLE test_data (col1 varchar(200),col2 varchar(200),col3 varchar(200),col4 varchar(200)) AS COPY FROM 'hdfs:///test/*.txt' DELIMITER ',';

    This would mean data in any "txt" file in the "/test" HDFS folder will be automatically visible via the table "test_data". Also, if another txt file is added to that folder or existing files are dropped or amended, these changes will be reflected in the table.

  • Hi mflower,
    Thanks for your valuable guidance.
    So when run incremental load on same tables from HDFS?
    How it will perform?
    Every time it will be the full loading ?
    Can you please your more guidance on this ?

  • Hello Mujeef,
    The external table provides a link to the external data source.
    Here's a link to more information:

  • Hello Mflower,
    I had done successful connection with CDH 16.1.1 with Vertica 9.2.1
    Also I am able to access data from CDH to Vertica.
    I am exporting data from Vertica to Hadoop using "Export to Paraquet " statement .
    Is there any other way to export data from Vertica to Hadoop?
    Because every time I am performing "Export to Vertica" it is creating new directory to export the object.
    I am using below statement for the reference :-
    EXPORT TO PARQUET(file = 'webhdfs://x.x.x.x:9870/data/',
    fileMode='432', dirMode='rwxrw-r-x')
    AS SELECT * FROM hadoop.test_data9;
    Can you please provide a better way we can do it. ?
    Thanks in advance.
    Mujeef Shaikh

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.