Configuring MultipleInputs-InputFormats-Mappers In Oozie MapReduce Action

Fixing java.io.IOException: Incompatible namespaceIDs

February 2, 2014

One Of Several Explanations To “could only be replicated to 0 nodes” Error

February 17, 2014

Published by Big Data In Real World at February 8, 2014

This post explains how to write a Oozie MapReduce action with Multiple Inputs and how each Inputs are configured to use different InputFormats and Mappers

Lets say you want to convert the below Configuration in to Oozie action. Here you have two input directories, inputDir1 and inputDir2. Files under inputDir1 is ofTextInputFormat and should be mapped by SampleMapper1. Files under inputDir2 is of SequenceFileInputFormat and should be mapped by SampleMapper2.

private JobConf getSampleConf(Path outputPath, Path inputDir1, Path inputDir2) {
  log.info("Creating configuration for sample job");
  JobConf conf = new JobConf(SampleProgram.class);
  conf.setOutputKeyClass(IntWritable.class);
  conf.setOutputValueClass(Text.class);
  conf.setCombinerClass(SampleReducer.class);
  conf.setReducerClass(SampleReducer.class);
  conf.setOutputFormat(TextOutputFormat.class);

  FileOutputFormat.setOutputPath(conf, outputPath);
  MultipleInputs.addInputPath(conf, inputDir1, TextInputFormat.class, SampleMapper1.class);
  MultipleInputs.addInputPath(conf, inputDir2, SequenceFileInputFormat.class, SampleMapper2.class);

  conf.setJobName("sample");

  return conf;
}

Here is the corresponding Oozie MapReduce action. For the InputFormat and Mapper classes we will give DelegatingInputFormatand DelegatingMapper respectively. Define the file input paths and corresponding mappers with mapred.input.dir.mappers property. Define the formats for files in each input paths using mapred.input.dir.formats property.

    <action name="sample">
    <map-reduce>
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${nameNode}/user/${wf:user()}/output"/>
      </prepare>
      <configuration>
        <property>
          <name>mapred.input.format.class</name>
          <value>org.apache.hadoop.mapred.lib.DelegatingInputFormat</value>
        </property>
        <property>
          <name>mapred.mapper.class</name>
          <value>org.apache.hadoop.mapred.lib.DelegatingMapper</value>
        </property>        
        <property>
          <name>mapred.input.dir.mappers</name>          
          <value>${nameNode}/user/${wf:user()}/input1;com.jerry.sample.SampleMapper1,${nameNode}/user/${wf:user()}/input2;com.jerry.sample.SampleMapper2</value>
        </property>
        <property>
          <name>mapred.input.dir.formats</name>          
          <value>${nameNode}/user/${wf:user()}/input1;org.apache.hadoop.mapred.TextInputFormat,${nameNode}/user/${wf:user()}/input2;org.apache.hadoop.mapred.SequenceFileInputFormat</value>
        </property>        
        <property>
          <name>mapred.combiner.class</name>
          <value>com.jerry.sample.SampleReducer</value>
        </property>
        <property>
          <name>mapred.reducer.class</name>
          <value>com.jerry.sample.SampleReducer</value>
        </property>
        <property>
          <name>mapred.output.dir</name>
          <value>/user/${wf:user()}/output</value>
        </property>        
        <property>
          <name>mapred.output.key.class</name>
          <value>org.apache.hadoop.io.IntWritable</value>
        </property>
        <property>
          <name>mapred.output.value.class</name>
          <value>org.apache.hadoop.io.Text</value>
        </property>        
        <property>
          <name>mapred.output.format.class</name>
          <value>org.apache.hadoop.mapred.TextOutputFormat</value>
        </property>
      </configuration>
    </map-reduce>
    <ok to="end"/>
    <error to="fail"/>
  </action>

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Configuring MultipleInputs-InputFormats-Mappers In Oozie MapReduce Action

Fixing java.io.IOException: Incompatible namespaceIDs

One Of Several Explanations To “could only be replicated to 0 nodes” Error

Fixing java.io.IOException: Incompatible namespaceIDs

One Of Several Explanations To “could only be replicated to 0 nodes” Error

This post explains how to write a Oozie MapReduce action with Multiple Inputs and how each Inputs are configured to use different InputFormats and Mappers

Big Data In Real World

Related posts

Sunset: Hadoop Developer In Real World cluster

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!

Leave a Reply Cancel reply