HDFS Block Placement Policy - Big Data In Real World

HDFS Block Placement Policy

Data Locality in Hadoop
July 28, 2015
InputSplit vs Block
August 4, 2015
Data Locality in Hadoop
July 28, 2015
InputSplit vs Block
August 4, 2015

HDFS Block Placement Policy

When a file is uploaded in to HDFS it will be divided in to blocks. HDFS will have to decide where to place these individual blocks in the cluster. HDFS block placement policy dictates a strategy of how and where to place replica blocks in the cluster.

Why Placement Policy Is Important?

Placement policy is important because it will try to keep the cluster balanced so that the blocks are equally distributed across the cluster. At the same time it is important to keep the blocks properly redundant. There is no point in storing all the blocks in one node because that one node will become the single point of failure and that is not ideal.

Hadoop changes the block placement policy between versions and there are several strategies. From Hadoop 0.21.0 the placement strategies are pluggable.

Default Placement Policy

The very first block will be stored on the same node as the client which is trying to upload the file.

The send replica block will be stored on a node in a different rack which is not the same rack where the first block is stored.

The third replica block will be stored on a node in the same rack as the second replica but on a different node.

Hadoop Placement Policy

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

HDFS Block Placement Policy
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X