Apache Pig Tutorial – Load Variations Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts […]
Hadoop Archives (HAR) Hadoop Archives (HAR) offers an effective way to deal with the small files problem. This post will explain – The problem with small […]
Datanode Block Scanner In this blog post we saw how HDFS handles and corrects data corruption in HDFS using checksum. During a write operation the datanode […]
Can Reducer always be reused for Combiner? A Combiner function is an optional intermediary function which is executed on the Map phase right after the execution […]
What is HDFS Federation? Namenode is responsible for the successful operation of HDFS. Namenode holds the entire metadata of HDFS, which includes information about files and […]
Changing Number Of Mappers Number of mappers always equals to the Number of splits. Having said that it is possible to control the number of splits […]
InputSplit vs Block The central idea behind MapReduce is distributed processing and hence the most important thing is to divide the dataset in to chunks and […]