Fixing java.io.IOException: Incompatible namespaceIDs
February 2, 2014One Of Several Explanations To “could only be replicated to 0 nodes” Error
February 17, 2014This post explains how to write a Oozie MapReduce action with Multiple Inputs and how each Inputs are configured to use different InputFormats and Mappers
Lets say you want to convert the below Configuration in to Oozie action. Here you have two input directories, inputDir1 and inputDir2. Files under inputDir1 is ofTextInputFormat and should be mapped by SampleMapper1. Files under inputDir2 is of SequenceFileInputFormat and should be mapped by SampleMapper2.
private JobConf getSampleConf(Path outputPath, Path inputDir1, Path inputDir2) { log.info("Creating configuration for sample job"); JobConf conf = new JobConf(SampleProgram.class); conf.setOutputKeyClass(IntWritable.class); conf.setOutputValueClass(Text.class); conf.setCombinerClass(SampleReducer.class); conf.setReducerClass(SampleReducer.class); conf.setOutputFormat(TextOutputFormat.class); FileOutputFormat.setOutputPath(conf, outputPath); MultipleInputs.addInputPath(conf, inputDir1, TextInputFormat.class, SampleMapper1.class); MultipleInputs.addInputPath(conf, inputDir2, SequenceFileInputFormat.class, SampleMapper2.class); conf.setJobName("sample"); return conf; }
Here is the corresponding Oozie MapReduce action. For the InputFormat and Mapper classes we will give DelegatingInputFormatand DelegatingMapper respectively. Define the file input paths and corresponding mappers with mapred.input.dir.mappers property. Define the formats for files in each input paths using mapred.input.dir.formats property.
<action name="sample"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/output"/> </prepare> <configuration> <property> <name>mapred.input.format.class</name> <value>org.apache.hadoop.mapred.lib.DelegatingInputFormat</value> </property> <property> <name>mapred.mapper.class</name> <value>org.apache.hadoop.mapred.lib.DelegatingMapper</value> </property> <property> <name>mapred.input.dir.mappers</name> <value>${nameNode}/user/${wf:user()}/input1;com.jerry.sample.SampleMapper1,${nameNode}/user/${wf:user()}/input2;com.jerry.sample.SampleMapper2</value> </property> <property> <name>mapred.input.dir.formats</name> <value>${nameNode}/user/${wf:user()}/input1;org.apache.hadoop.mapred.TextInputFormat,${nameNode}/user/${wf:user()}/input2;org.apache.hadoop.mapred.SequenceFileInputFormat</value> </property> <property> <name>mapred.combiner.class</name> <value>com.jerry.sample.SampleReducer</value> </property> <property> <name>mapred.reducer.class</name> <value>com.jerry.sample.SampleReducer</value> </property> <property> <name>mapred.output.dir</name> <value>/user/${wf:user()}/output</value> </property> <property> <name>mapred.output.key.class</name> <value>org.apache.hadoop.io.IntWritable</value> </property> <property> <name>mapred.output.value.class</name> <value>org.apache.hadoop.io.Text</value> </property> <property> <name>mapred.output.format.class</name> <value>org.apache.hadoop.mapred.TextOutputFormat</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="fail"/> </action>