Options

Hadoop-Vertica Connector: Can I write to Vertica in Map procedure?

Hi all, I am using Hadoop-Vertica Connector to import a large file into Vertica. I was trying to use hadoop to do that without the Reducer.  However the vertica output table seems cannot initialize during Mapping procedure, there are errors all the time.

When I check the document, it didn't say that we can write to Vertica during Mapping, so I was wondering if we can do that?

Thank you!

EDIT:

Error:
java.io.IOException: Cannot set record by name if names not initialized  at com.vertica.hadoop.VerticaRecord.set(VerticaRecord.java:270)  at com.vertica.hadoop.VerticaWordCount$TokenizerMapper.map(VerticaWordCount.java:92)  at com.vertica.hadoop.VerticaWordCount$TokenizerMapper.map(VerticaWordCount.java:60)  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)  at java.security.AccessController.doPrivileged(Native Method)  at javax.security.auth.Subject.doAs(Subject.java:396)  at org.apache.hadoop.security.UserGroupInformation.doA  

Check the source code of VerticaWordCount.java, I found that the name list of the output table was not initialized at all.

Here is my configuration in run():

  Job job = new Job(conf, "vertica hadoop");    conf = job.getConfiguration();    conf.set("mapreduce.job.tracker", "local");      //job.setInputFormatClass(VerticaInputFormat.class);    //You have to set the MapOutputKeyClass and MapOutputValueClass,     //since by default it will be the same as the class of Reducer's    //Output Key and Value    job.setMapOutputKeyClass(Text.class);    job.setMapOutputValueClass(VerticaRecord.class);      /*************Settings for Vertica output************************/    //Set the output format of Reduce class.     //I will output VerticaRecords that will be stored in the database    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(VerticaRecord.class);      //Tell Hadoop to send its output to the Vertica    job.setOutputFormatClass(VerticaOutputFormat.class);    /****************************************************************/      job.setJarByClass(VerticaWordCount.class);    job.setMapperClass(TokenizerMapper.class);    FileInputFormat.addInputPath(job, new Path("/user/tmp/input"));      /******************************************************************/    //Defining the output table    //VerticaOutputFormat.setOutput(jobObject, tableName, [truncate, ["columnName1 dataType1" [,"columnNamen dataTypen" ...]] );    VerticaOutputFormat.setOutput(job, "target", true, "a int", "b varchar", "c varchar")

Comments

  • Options
    Hi,

    May I please know what is the version of the connector?


    Regards
    Bhawana

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file