Query Parquet data through Vertica(Vertica Hadoop Integration)

edited May 2017 in General Discussion

Hello guys,
So I have a Hadoop cluster with three nodes. Vertica is co-located on cluster. There are Parquet files on HDFS. My goal is to query those files using Vertica.

Right now what I did is using HDFS Connector, basically create an external table in Vertica, then link it to HDFS:

CREATE EXTERNAL TABLE tableName (columns)
AS COPY FROM "hdfs://hostname/...../data" PARQUET;

Since the data size is big. This method will not achieve good performance.

I have done some research, here is the link_[Vertica Hadoop Integration]_(https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/HadoopIntegrationGuide.htm?TocPath=Integrating%20with%20Apache%20Hadoop|_____0 "https://my.vertica.com/docs/7.2.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/IntroductionToHadoopIntegration.htm?TocPath=Integrating%20with%20Apache%20Hadoop|_____1")
I have tried HCatalog but there's some configuration error on my Hadoop so that's not working.

My use case is to not change data format on HDFS(Parquet), while query it using Vertica. Any ideas on how to do that?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file