We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


load data from vertica to hadoop using sqoop — Vertica Forum

load data from vertica to hadoop using sqoop

  1. I am trying to pull from a vertica table which has 15 billions records in it to hadoop, and it is extremely slow,the sqoop statement is shown below, I am using 25 mappers, previously I was able to do for another table with 211 miliions records for less than 10 mins which is quite good, I thought with 15 billions, it will not be that bad, but what i see on hadoop side in job browser is after 5% it is not going  anywhere, i checked on both hadoop and vertica side, resource looks ok, I am asking our network people to check the traffic between vertica and hadoop cluster(the bandwidth is 20GB+), wonder if anybody here has tried this kind of pull before and any thought on where might go wrong? Also, I saw in MC memory usage is only < 10%, but when i go to shell , type free and I saw my 128GB memory almost used up, why is that? 
  2. sqoop import -m 25 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://myhost:5433/vgt_edw" --P  --username "dbadmin" --target-dir "/someLocationOnHadoop" --verbose --query 'SELECT * FROM mytable WHERE  $CONDITIONS' --split-by myKey;

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file