Vertica Integration with Hadoop

Mujeef · September 2019

Hi Expertise,
Currently I am facing issue with Vertica Integration with hadoop.
We have requirement to get data from hadoop(HDFS ).
Is there any way to export data from there ?
Can provide simple steps to configure in Vertica 9.1 Enterprise ?
Also we have to sync up data with HDFS ,whatever the data is updated in HDFS same changes updated in vertica as well.

Thanks in advance !!

Mujeef

LenoyJ · September 2019

Referring you to our documentation that goes through all this in much detail: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/HadoopIntegrationGuide/HadoopIntegrationGuide.htm

Basically:

Locate your HDFS config files: core-site.xml and hdfs-site.xml.
Copy them over to every Vertica node you have at the same path. Say, /home/dbadmin

Run vsql and set it to the directory:

ALTER DATABASE dbname SET HadoopConfDir = '/home/dbadmin';

If you are on >8.1, verify if it is okay with:
```
SELECT VERIFY_HADOOP_CONF_DIR();
```
If you are on >9.2, a better function is available to verify:
```
SELECT HDFS_CLUSTER_CONFIG_CHECK();
```

Then use the hdfs:/// scheme with COPY to get data from HDFS. See: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/HadoopIntegrationGuide/libhdfs/HdfsURL.htm

Mujeef · September 2019

Hi Lenoy,

Thank you very much for your response.
I followed the above all steps successfully.
While running below statement it got failed:
CREATE EXTERNAL TABLE public.test_data (col1 varchar(200),col2 varchar(200),col3 varchar(200),col4 varchar(200))
AS COPY FROM 'hdfs:///test/.txt' DELIMITER ',';
Error :Vertica]VJDBC ERROR: Failed to glob [hdfs:///mujju/.txt] because of error: hdfs:///test/: listStat failed; error: End of file

Can you suggest ?

Thanks,
Mujeef

SruthiA · October 2019

@Mujeef what you are seeing with regards to EXPORT TO PARQUET is expected.
The destination directory for the Parquet files. The directory must not exist, and the current user must have permission to write it. The destination can be on HDFS, S3, or an NFS mount point on the local file system.
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Statements/EXPORTTOPARQUET.htm
please find the below link which provides way to export data from vertica to either s3, local file system etc
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/HadoopIntegrationGuide/NativeFormats/ExportingData.htm

SruthiA · September 2019

Could you please run sample CURL test to see if we can access hadoop cluster? Please replace the filename, path and namenode value accordingly.

curl -i -L http://hadoopNameNode:50070/webhdfs/v1/tmp/test.txt?op=OPEN

Mujeef · September 2019

Hi Sruthi,
I unable to open the link you provided.
Do I need to set passwordless connection between Vertica Node to Hadoop cluster ?
It will be helpful if you provide steps to configure vertica with hadoop.

Thanks,
Mujeef

SruthiA · September 2019

Hi Mujeef,

It is not a link. please replace Namenode , path and file name in the below CURL command and run it on vertica node and share me the output.

curl -i -L http://hadoopNameNode:50070/webhdfs/v1/tmp/test.txt?op=OPEN

Mujeef · September 2019

Hi Sruthi,
We have CDH 6.1.1 , can we integrated with Vertica 9.1 or 9.2 ?
Thanks,
Mujeef

SruthiA · September 2019

Hi Mujeef,

The latest we support and officially tested is 6.0 on 9.2. You can integrate it with 6.1.1 and it should work in general. if you face any issues, please open a support case

https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SupportedPlatforms/IntegrationsForHadoop.htm

Sruthi

Mujeef · September 2019

Hi Shruti,
Thanks for your quick response.
Just here one question
I need to real time synchronization with Hadoop to Vertica.
I mean Whatever the changes don on HDFS same will update on Vertica database as well.
Is there a way we can archive this ?
Thanks,
Mujeef Shaikh

mflower · September 2019

Hello Mujeef,
Using similar steps to those described above, in order to expose the HDFS data to Vertica
ie: CREATE EXTERNAL TABLE test_data (col1 varchar(200),col2 varchar(200),col3 varchar(200),col4 varchar(200)) AS COPY FROM 'hdfs:///test/*.txt' DELIMITER ',';

This would mean data in any "txt" file in the "/test" HDFS folder will be automatically visible via the table "test_data". Also, if another txt file is added to that folder or existing files are dropped or amended, these changes will be reflected in the table.

Mujeef · September 2019

Hi mflower,
Thanks for your valuable guidance.
So when run incremental load on same tables from HDFS?
How it will perform?
Every time it will be the full loading ?
Can you please your more guidance on this ?
Thanks,
Mujeef

mflower · September 2019

Hello Mujeef,
The external table provides a link to the external data source.
Here's a link to more information: https://my.vertica.com/docs/latest/HTML/Content/Authoring/AdministratorsGuide/Tables/ExternalTables/CreatingExternalTables.htm

Mujeef · October 2019

Hello Mflower,
I had done successful connection with CDH 16.1.1 with Vertica 9.2.1
Also I am able to access data from CDH to Vertica.
I am exporting data from Vertica to Hadoop using "Export to Paraquet " statement .
Is there any other way to export data from Vertica to Hadoop?
Because every time I am performing "Export to Vertica" it is creating new directory to export the object.
I am using below statement for the reference :-
EXPORT TO PARQUET(file = 'webhdfs://x.x.x.x:9870/data/',
fileMode='432', dirMode='rwxrw-r-x')
AS SELECT * FROM hadoop.test_data9;
Can you please provide a better way we can do it. ?
Thanks in advance.
Mujeef Shaikh

We're Moving!

Create My New Community Account Now

Vertica Integration with Hadoop

Best Answers

Answers

Leave a Comment