Apache Pig Tutorial – Map
December 31, 2015Missing Artifact JDK Tools Jar
June 23, 2016Hadoop Mapper and Reducer Output Mismatch
Can you have different output Key Value pair types for Mapper and Reducer in a MapReduce program?
Short answer – absolutely yes.
Below signature for Mapper and Reducer from the same MapReduce program and they both are totally valid.
public class MaxClosePriceMapper extends Mapper<LongWritable, Text, Text, FloatWritable> public class MaxClosePriceReducer extends Reducer<Text, FloatWritable, FloatWritable, Text>
We absolutely know the above is valid. Yet, when we execute the MapReduce program, the execution fail with the below error.
16/06/23 01:58:11 INFO mapreduce.Job: Task Id : attempt_1458616310472_2428_m_000002_0, Status : FAILED Error: java.io.IOException: wrong key class: class org.apache.hadoop.io.FloatWritable is not class org.apache.hadoop.io.Text at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:196) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1307) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1624) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at com.hirw.maxcloseprice.MaxClosePriceReducer.reduce(MaxClosePriceReducer.java:31) at com.hirw.maxcloseprice.MaxClosePriceReducer.reduce(MaxClosePriceReducer.java:14) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1645) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
What went wrong ?
Here is the driver program. This looks OK right ? So what went wrong ?
Here in the Driver program below we are using the same Reducer class for the Combiner. The Combiner runs on the Map side and the output key value pairs from the combiner will be sent as the input to the Reducer.
We are reusing the Reducer for the Combiner, so the type of the output key value pair from the Combiner (Reducer) will be FloatWritable and Text and it will not match with the Reducer’s input type key value pair – Text and FloatWritable and hence the error.
Job job = new Job(); job.setJarByClass(MaxClosePrice.class); job.setJobName("MaxClosePrice"); //Set input and output locations FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); //Set Input and Output formats job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); //Set Mapper and Reduce classes job.setMapperClass(MaxClosePriceMapper.class); job.setReducerClass(MaxClosePriceReducer.class); //Combiner (optional) job.setCombinerClass(MaxClosePriceReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(FloatWritable.class); //Output types job.setOutputKeyClass(FloatWritable.class); job.setOutputValueClass(Text.class);
So the Solution ?
Don’t reuse the Reducer for Combiner if the Reducer’s input and output key value pair types does not match. In the above program simply comment the below line and the program should work.
//job.setCombinerClass(MaxClosePriceReducer.class);