Hive to vertica data export with Unix named pipe

kosmik5 · January 2018

Hi,
Can someone please help me that how to do large and fast export to Hive to Vetica without any Hadoop connector?

Currently, i am exporting the same thing via Unix Namedpipe but performance is not that good.

almost 5 parallel thread to load the data into Vertica and time is approx 230 min for 1.6 billion recordsets?

can someone please help me to improve this performance and if we can optimize this export?

Thank You
kosmiktechnologies.com

Jim_Knicely · January 2018

Why not just read directly from HDFS?

See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/HadoopIntegrationGuide/libhdfs/ReadingDirectlyFromHDFS.htm

sKwa · January 2018

Hi!

Can you specify where is a bottleneck(on data reading from Hive or on data writing to Vertica)? Is it IO problem or Network problem? Is it CPU bottleneck(some custom parser)?

almost 5 parallel thread to load the data into Vertica and time is approx 230 min for 1.6 billion recordsets?

It doesn't say a thing

How many bytes/mb/gb/tb?
How many nodes?
What is your physical configuration(Hive and Vertica on same servers)?
Did you check IO on Vertica(may be IO limit reached)?
Does all 5 threads to same node? Or loads are distributed over cluster nodes?

PS: too many questions.
You need to specify where is a problem(may be a problem with hardware and your disk can't give better IO).

We're Moving!

Create My New Community Account Now

Hive to vertica data export with Unix named pipe

Comments

Leave a Comment