Hadoop – Page 5 – Big Data In Real World

JobTracker and TaskTracker JobTracker and TaskTracker are 2 essential process involved in MapReduce execution in MRv1 (or Hadoop version 1). Both processes are now deprecated in […]

July 12, 2015

Published by Big Data In Real World at July 12, 2015

Categories

Hadoop

NameNode and DataNode

NameNode and DataNode In this post let’s talk about the 2 important types of nodes and it’s functions in your Hadoop cluster – NameNode and DataNode. […]

July 3, 2015

Published by Big Data In Real World at July 3, 2015

Categories

Hadoop

How to change default replication factor?

How to change default replication factor? What Is Replication Factor? Replication factor dictates how many copies of a block should be kept in your cluster. […]

June 25, 2015

Published by Big Data In Real World at June 25, 2015

Categories

Hadoop

How to change default block size in HDFS

How to change default block size in HDFS? In this post we are going to see how to upload a file to HDFS overriding the default […]

June 25, 2015

Published by Big Data In Real World at June 25, 2015

Categories

Hadoop

What is DistCp?

“Working With HDFS” chapter in our Hadoop Starter Kit course covers the details on working with HDFS. In that chapter we looked at how to copy, […]

June 20, 2015

Published by Big Data In Real World at June 20, 2015

Categories

Hadoop

What is Hadoop?

What is Hadoop? – A beginner’s tutorial to understand Big Data problem and Hadoop In this Post we looked at What is Big Data. To learn […]

June 19, 2015

Published by Big Data In Real World at June 19, 2015

Categories

Hadoop

What is Big Data?

What is Big Data? – A beginner’s tutorial In this blog post, we are going to see the following What is Big Data? Examples of Big […]

June 5, 2015

Published by Big Data In Real World at June 5, 2015

Categories

Hadoop

The Power Of Big Data

One of my close friends recently joined Microsoft in Seattle in their highly acclaimed data analysis team. I asked him what was his first assignment. He […]

October 22, 2014

Published by Big Data In Real World at October 22, 2014

Categories

Hadoop

Preparing for Hadoop Interview

3 years ago only a small number of companies were using Hadoop. Now Hadoop technology has grown leaps and bounds so as its user base. Companies […]

September 24, 2014

Published by Big Data In Real World at September 24, 2014

Categories

Hadoop

How do you debug a performance issue or a long running job in Hadoop?

This post will explain how can you approach the above question when asked in an interview. This is an open ended interview question and the interviewer […]

June 18, 2014

Published by Big Data In Real World at June 18, 2014

Categories

Hadoop

Explaining ToolRunner

This post explains the class relationship when we use ToolRunner to run a MapReduce job. It is not really complicated but we use the below pictorial […]

May 11, 2014

Published by Big Data In Real World at May 11, 2014

Categories

Hadoop

MRUnit To Test MapReduce

This post explains how to unit test a MapReduce program using MRUnit. Apache MRUnit ™ is a Java library that helps developers unit test Apache Hadoop map […]

April 19, 2014

Published by Big Data In Real World at April 19, 2014

Categories

Hadoop

Using Million Song Dataset In Hadoop

What is Million Song Dataset ? The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. The […]

April 4, 2014

Published by Big Data In Real World at April 4, 2014

Categories

Hadoop

Input For Page Ranking Using Hadoop

If you are new to Hadoop, you are probably tired of WordCount and want to get hands on with some real use cases. Page ranking is an […]

March 3, 2014

Published by Big Data In Real World at March 3, 2014

Categories

Hadoop

Fixing org.apache.hadoop.security.AccessControlException: Permission denied

Executions in Hadoop use the underlying logged in username to figure out the permissions in the cluster. When running jobs or working with HDFS, the user […]

February 17, 2014

Published by Big Data In Real World at February 17, 2014

Categories

Hadoop

One Of Several Explanations To “could only be replicated to 0 nodes” Error

There could be several reasons when you see “could only be replicated to 0 nodes” message in your exception when you are trying to write something […]

February 8, 2014

Published by Big Data In Real World at February 8, 2014

Categories

Hadoop

Configuring MultipleInputs-InputFormats-Mappers In Oozie MapReduce Action

This post explains how to write a Oozie MapReduce action with Multiple Inputs and how each Inputs are configured to use different InputFormats and Mappers Lets […]

February 2, 2014

Published by Big Data In Real World at February 2, 2014

Categories

Hadoop

Fixing java.io.IOException: Incompatible namespaceIDs

This post explains the fix when you see the below error when starting Datanode. 2013-12-14 23:39:09,354 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-hadoop-user/dfs/data: namenode namespaceID […]