load data from vertica to hadoop using sqoop

yan_fei · October 2014

I am trying to pull from a vertica table which has 15 billions records in it to hadoop, and it is extremely slow,the sqoop statement is shown below, I am using 25 mappers, previously I was able to do for another table with 211 miliions records for less than 10 mins which is quite good, I thought with 15 billions, it will not be that bad, but what i see on hadoop side in job browser is after 5% it is not going anywhere, i checked on both hadoop and vertica side, resource looks ok, I am asking our network people to check the traffic between vertica and hadoop cluster(the bandwidth is 20GB+), wonder if anybody here has tried this kind of pull before and any thought on where might go wrong? Also, I saw in MC memory usage is only < 10%, but when i go to shell , type free and I saw my 128GB memory almost used up, why is that?
sqoop import -m 25 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://myhost:5433/vgt_edw" --P --username "dbadmin" --target-dir "/someLocationOnHadoop" --verbose --query 'SELECT * FROM mytable WHERE $CONDITIONS' --split-by myKey;

We're Moving!