Vertica Integration with Hadoop

Hi Expertise,
Currently I am facing issue with Vertica Integration with hadoop.
We have requirement to get data from hadoop(HDFS ).
Is there any way to export data from there ?
Can provide simple steps to configure in Vertica 9.1 Enterprise ?
Also we have to sync up data with HDFS ,whatever the data is updated in HDFS same changes updated in vertica as well.

Thanks in advance !!

Mujeef

Tagged:

Best Answers

Answers

  • SruthiASruthiA Vertica Employee Administrator

    Could you please run sample CURL test to see if we can access hadoop cluster? Please replace the filename, path and namenode value accordingly.

    curl -i -L  http://hadoopNameNode:50070/webhdfs/v1/tmp/test.txt?op=OPEN

  • Hi Sruthi,
    I unable to open the link you provided.
    Do I need to set passwordless connection between Vertica Node to Hadoop cluster ?
    It will be helpful if you provide steps to configure vertica with hadoop.

    Thanks,
    Mujeef

  • SruthiASruthiA Vertica Employee Administrator
    edited September 2019

    Hi Mujeef,

    It is not a link. please replace Namenode , path and file name in the below CURL command and run it on vertica node and share me the output.

    curl -i -L http://hadoopNameNode:50070/webhdfs/v1/tmp/test.txt?op=OPEN

  • Hi Sruthi,
    We have CDH 6.1.1 , can we integrated with Vertica 9.1 or 9.2 ?
    Thanks,
    Mujeef

  • SruthiASruthiA Vertica Employee Administrator

    Hi Mujeef,

    The latest we support and officially tested is 6.0 on 9.2. You can integrate it with 6.1.1 and it should work in general. if you face any issues, please open a support case

    https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SupportedPlatforms/IntegrationsForHadoop.htm

    Sruthi

  • Hi Shruti,
    Thanks for your quick response.
    Just here one question
    I need to real time synchronization with Hadoop to Vertica.
    I mean Whatever the changes don on HDFS same will update on Vertica database as well.
    Is there a way we can archive this ?
    Thanks,
    Mujeef Shaikh

  • mflowermflower Vertica Employee Employee

    Hello Mujeef,
    Using similar steps to those described above, in order to expose the HDFS data to Vertica
    ie: CREATE EXTERNAL TABLE test_data (col1 varchar(200),col2 varchar(200),col3 varchar(200),col4 varchar(200)) AS COPY FROM 'hdfs:///test/*.txt' DELIMITER ',';

    This would mean data in any "txt" file in the "/test" HDFS folder will be automatically visible via the table "test_data". Also, if another txt file is added to that folder or existing files are dropped or amended, these changes will be reflected in the table.

  • Hi mflower,
    Thanks for your valuable guidance.
    So when run incremental load on same tables from HDFS?
    How it will perform?
    Every time it will be the full loading ?
    Can you please your more guidance on this ?
    Thanks,
    Mujeef

  • mflowermflower Vertica Employee Employee

    Hello Mujeef,
    The external table provides a link to the external data source.
    Here's a link to more information: https://my.vertica.com/docs/latest/HTML/Content/Authoring/AdministratorsGuide/Tables/ExternalTables/CreatingExternalTables.htm

  • Hello Mflower,
    I had done successful connection with CDH 16.1.1 with Vertica 9.2.1
    Also I am able to access data from CDH to Vertica.
    I am exporting data from Vertica to Hadoop using "Export to Paraquet " statement .
    Is there any other way to export data from Vertica to Hadoop?
    Because every time I am performing "Export to Vertica" it is creating new directory to export the object.
    I am using below statement for the reference :-
    EXPORT TO PARQUET(file = 'webhdfs://x.x.x.x:9870/data/',
    fileMode='432', dirMode='rwxrw-r-x')
    AS SELECT * FROM hadoop.test_data9;
    Can you please provide a better way we can do it. ?
    Thanks in advance.
    Mujeef Shaikh

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file