Query Parquet data through Vertica (Vertica Hadoop Integration)
So I have a Hadoop cluster with three nodes. Vertica is co-located on cluster. There are Parquet files on HDFS. My goal is to query those files using Vertica.
Right now what I did is using HDFS Connector, basically create an external table in Vertica, then link it to HDFS:
CREATE EXTERNAL TABLE tableName (columns)
AS COPY FROM "hdfs://hostname/...../data" PARQUET;
Since the data size is big. This method will not achieve good performance.
I have done some research, Vertica Hadoop Integration
I have tried HCatalog but there's some configuration error on my Hadoop so that's not working.
My use case is to not change data format on HDFS(Parquet), while query it using Vertica. Any ideas on how to do that?