How to change default block size in HDFS?

In this post we are going to see how to upload a file to HDFS overriding the default block size. In the older versions of Hadoop the default block size was 64 MB and in the newer versions the default block size is 128 MB.

Let’s assume that the default block size in your cluster is 128 MB. What if you want all the datasets in your HDFS to have the default block size of 128 MB but you would like one specific dataset to be stored with block size of 256 MB?

Refer to the “HDFS – Why Another Filesystem” chapter in the FREE Hadoop Starter Kit course to learn more about the block size in other filesytems.

Why would you want to make the block size of specific dataset from 128 to 256 MB?

To answer this question, you need to understand what is the benefit of having a larger block size. A single HDFS block (64 MB or 128 MB or ..) will be written to disk sequentially. When you write the data sequentially there is a fair chance that the data will be written into contiguous space on disk which means that data will be written next to each other in a continuous fashion. When a data is laid out in the disk in continuous fashion it reduces the number of disk seeks during the read operation resulting in an efficient read. So that is why block size in HDFS is huge when compared to the other file systems.

Now the question becomes should I make my dataset 128 MB or 256 MB or even more? It all depends on your cluster capacity and the size of your datasets. Lets say you have a dataset which is 2 Petabytes in size. Having a 64 MB block size for this dataset will result in 31 million+ blocks which would put stress on the NameNode to manage all that blocks. Having a lot of blocks will also result in a lot of mappers during MapReduce execution. So in this case you may decide to increase the block size just for that dataset.

Try it out

Try the commands in our cluster. Click here to get FREE access to the cluster.

Changing the block size when you upload a file in HDFS is very simple.

--Create directory if it does not exist 
hadoop fs -mkdir blksize

--Copy a file to HDFS with default block size (128 MB)
hadoop fs -copyFromLocal /hirw-starterkit/hdfs/commands/dwp-payments-april10.csv blksize

--Override the default block size with 265 MB
hadoop fs -D dfs.blocksize=268435456 -copyFromLocal /hirw-starterkit/hdfs/commands/dwp-payments-april10.csv blksize/dwp-payments-april10_256MB.csv

Verify in UI

Make sure to update the <username> with your username and <dir>

http://ec2-54-175-1-192.compute-1.amazonaws.com:50070/explorer.html#/user/<username>/<dir>

For eg. –

http://ec2-54-175-1-192.compute-1.amazonaws.com:50070/explorer.html#/user/hirwuser150430/blksize

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to change default block size in HDFS

What is DistCp?

How to change default replication factor?

What is DistCp?

How to change default replication factor?

How to change default block size in HDFS?

Why would you want to make the block size of specific dataset from 128 to 256 MB?

Try it out

Verify in UI

Big Data In Real World

Related posts

Sunset: Hadoop Developer In Real World cluster

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!