Big Data In Real World – Page 13 – Big Data In Real World

August 9, 2015

Published by Big Data In Real World at August 9, 2015

Categories

Hadoop

Changing Number Of Mappers

Changing Number Of Mappers Number of mappers always equals to the Number of splits. Having said that it is possible to control the number of splits […]

August 4, 2015

Published by Big Data In Real World at August 4, 2015

Categories

Hadoop

InputSplit vs Block

InputSplit vs Block The central idea behind MapReduce is distributed processing and hence the most important thing is to divide the dataset in to chunks and […]

August 1, 2015

Published by Big Data In Real World at August 1, 2015

Categories

Hadoop

HDFS Block Placement Policy

HDFS Block Placement Policy When a file is uploaded in to HDFS it will be divided in to blocks. HDFS will have to decide where to […]

July 28, 2015

Published by Big Data In Real World at July 28, 2015

Categories

Hadoop

Data Locality in Hadoop

Data Locality in Hadoop Data Locality in Hadoop refers to the “proximity” of the data with respect to the Mapper tasks working on the data. Why […]

July 19, 2015

Published by Big Data In Real World at July 19, 2015

Categories

Hadoop

Hadoop Modes

Hadoop Modes Hadoop cluster is made up of several key process and each process is designed to do a specific task. Here are the key daemons […]

July 14, 2015

Published by Big Data In Real World at July 14, 2015

Categories

Hadoop

JobTracker and TaskTracker

JobTracker and TaskTracker JobTracker and TaskTracker are 2 essential process involved in MapReduce execution in MRv1 (or Hadoop version 1). Both processes are now deprecated in […]

July 12, 2015

Published by Big Data In Real World at July 12, 2015

Categories

Hadoop

NameNode and DataNode

NameNode and DataNode In this post let’s talk about the 2 important types of nodes and it’s functions in your Hadoop cluster – NameNode and DataNode. […]

July 3, 2015

Published by Big Data In Real World at July 3, 2015

Categories

Hadoop

How to change default replication factor?

How to change default replication factor? What Is Replication Factor? Replication factor dictates how many copies of a block should be kept in your cluster. […]

June 25, 2015

Published by Big Data In Real World at June 25, 2015

Categories

Hadoop

How to change default block size in HDFS

How to change default block size in HDFS? In this post we are going to see how to upload a file to HDFS overriding the default […]

June 25, 2015

Published by Big Data In Real World at June 25, 2015

Categories

Hadoop

What is DistCp?

“Working With HDFS” chapter in our Hadoop Starter Kit course covers the details on working with HDFS. In that chapter we looked at how to copy, […]

June 20, 2015

Published by Big Data In Real World at June 20, 2015

Categories

Hadoop

What is Hadoop?

What is Hadoop? – A beginner’s tutorial to understand Big Data problem and Hadoop In this Post we looked at What is Big Data. To learn […]

June 19, 2015

Published by Big Data In Real World at June 19, 2015

Categories

Hadoop

What is Big Data?

What is Big Data? – A beginner’s tutorial In this blog post, we are going to see the following What is Big Data? Examples of Big […]

June 5, 2015

Published by Big Data In Real World at June 5, 2015

Categories

Hadoop

The Power Of Big Data

One of my close friends recently joined Microsoft in Seattle in their highly acclaimed data analysis team. I asked him what was his first assignment. He […]

October 22, 2014

Published by Big Data In Real World at October 22, 2014

Categories

Hadoop

Preparing for Hadoop Interview

3 years ago only a small number of companies were using Hadoop. Now Hadoop technology has grown leaps and bounds so as its user base. Companies […]

September 24, 2014

Published by Big Data In Real World at September 24, 2014

Categories

Hadoop

How do you debug a performance issue or a long running job in Hadoop?

This post will explain how can you approach the above question when asked in an interview. This is an open ended interview question and the interviewer […]

June 18, 2014

Published by Big Data In Real World at June 18, 2014

Categories

Hadoop

Explaining ToolRunner

This post explains the class relationship when we use ToolRunner to run a MapReduce job. It is not really complicated but we use the below pictorial […]

May 11, 2014

Published by Big Data In Real World at May 11, 2014

Categories

Hadoop

MRUnit To Test MapReduce

This post explains how to unit test a MapReduce program using MRUnit. Apache MRUnit ™ is a Java library that helps developers unit test Apache Hadoop map […]

April 19, 2014

Published by Big Data In Real World at April 19, 2014

Categories

Hadoop

Using Million Song Dataset In Hadoop

What is Million Song Dataset ? The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. The […]

April 4, 2014

Published by Big Data In Real World at April 4, 2014

Categories

Hadoop

Input For Page Ranking Using Hadoop

If you are new to Hadoop, you are probably tired of WordCount and want to get hands on with some real use cases. Page ranking is an […]

March 3, 2014

Published by Big Data In Real World at March 3, 2014

Categories

Hadoop

Fixing org.apache.hadoop.security.AccessControlException: Permission denied

Executions in Hadoop use the underlying logged in username to figure out the permissions in the cluster. When running jobs or working with HDFS, the user […]