Configuring MultipleInputs-InputFormats-Mappers In Oozie MapReduce Action - Big Data In Real World

Configuring MultipleInputs-InputFormats-Mappers In Oozie MapReduce Action

Fixing java.io.IOException: Incompatible namespaceIDs
February 2, 2014
One Of Several Explanations To “could only be replicated to 0 nodes” Error
February 17, 2014
Fixing java.io.IOException: Incompatible namespaceIDs
February 2, 2014
One Of Several Explanations To “could only be replicated to 0 nodes” Error
February 17, 2014

This post explains how to write a Oozie MapReduce action with Multiple Inputs and how each Inputs are configured to use different InputFormats and Mappers

Lets say you want to convert the below Configuration in to Oozie action. Here you have two input directories, inputDir1 and inputDir2. Files under inputDir1 is ofTextInputFormat  and should be mapped by SampleMapper1. Files under inputDir2 is of SequenceFileInputFormat  and should be mapped by SampleMapper2.

private JobConf getSampleConf(Path outputPath, Path inputDir1, Path inputDir2) {
  log.info("Creating configuration for sample job");
  JobConf conf = new JobConf(SampleProgram.class);
  conf.setOutputKeyClass(IntWritable.class);
  conf.setOutputValueClass(Text.class);
  conf.setCombinerClass(SampleReducer.class);
  conf.setReducerClass(SampleReducer.class);
  conf.setOutputFormat(TextOutputFormat.class);

  FileOutputFormat.setOutputPath(conf, outputPath);
  MultipleInputs.addInputPath(conf, inputDir1, TextInputFormat.class, SampleMapper1.class);
  MultipleInputs.addInputPath(conf, inputDir2, SequenceFileInputFormat.class, SampleMapper2.class);

  conf.setJobName("sample");

  return conf;
}

Here is the corresponding Oozie MapReduce action. For the InputFormat and Mapper classes we will give DelegatingInputFormatand DelegatingMapper  respectively. Define the file input paths and corresponding mappers with mapred.input.dir.mappers  property. Define the formats for files in each input paths using mapred.input.dir.formats  property.

    <action name="sample">
    <map-reduce>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/${wf:user()}/output"/>
      </prepare>
      <configuration>
        <property>
          <name>mapred.input.format.class</name>
          <value>org.apache.hadoop.mapred.lib.DelegatingInputFormat</value>
        </property>
        <property>
          <name>mapred.mapper.class</name>
          <value>org.apache.hadoop.mapred.lib.DelegatingMapper</value>
        </property>        
        <property>
          <name>mapred.input.dir.mappers</name>          
          <value>${nameNode}/user/${wf:user()}/input1;com.jerry.sample.SampleMapper1,${nameNode}/user/${wf:user()}/input2;com.jerry.sample.SampleMapper2</value>
        </property>
        <property>
          <name>mapred.input.dir.formats</name>          
          <value>${nameNode}/user/${wf:user()}/input1;org.apache.hadoop.mapred.TextInputFormat,${nameNode}/user/${wf:user()}/input2;org.apache.hadoop.mapred.SequenceFileInputFormat</value>
        </property>        
        <property>
          <name>mapred.combiner.class</name>
          <value>com.jerry.sample.SampleReducer</value>
        </property>
        <property>
          <name>mapred.reducer.class</name>
          <value>com.jerry.sample.SampleReducer</value>
        </property>
        <property>
          <name>mapred.output.dir</name>
          <value>/user/${wf:user()}/output</value>
        </property>        
        <property>
          <name>mapred.output.key.class</name>
          <value>org.apache.hadoop.io.IntWritable</value>
        </property>
        <property>
          <name>mapred.output.value.class</name>
          <value>org.apache.hadoop.io.Text</value>
        </property>        
        <property>
          <name>mapred.output.format.class</name>
          <value>org.apache.hadoop.mapred.TextOutputFormat</value>
        </property>
      </configuration>
    </map-reduce>
    <ok to="end"/>
    <error to="fail"/>
  </action>
Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Leave a Reply

Configuring MultipleInputs-InputFormats-Mappers In Oozie MapReduce Action
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X