Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the becustom domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wordpress-seo domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893
{"id":427,"date":"2015-06-25T06:01:18","date_gmt":"2015-06-25T11:01:18","guid":{"rendered":"https:\/\/www.bigdatainrealworld.com\/?p=427"},"modified":"2023-02-19T07:33:27","modified_gmt":"2023-02-19T13:33:27","slug":"how-to-change-default-block-size-in-hdfs","status":"publish","type":"post","link":"https:\/\/www.bigdatainrealworld.com\/how-to-change-default-block-size-in-hdfs\/","title":{"rendered":"How to change default block size in HDFS"},"content":{"rendered":"

How to change default block size in HDFS?<\/h1>\n

In this post we are going to see how to upload a file to HDFS overriding the default block size. In the older versions of Hadoop the default block size was 64 MB and in the newer versions the default block size is 128 MB.<\/span><\/p>\n

Let’s assume that the default block size in your cluster is 128 MB. What if you want all the datasets in your HDFS to have the default block size of 128 MB but you would like one specific dataset to be stored with block size of 256 MB?<\/p>\n

Refer to the “HDFS – Why Another Filesystem” chapter in the FREE Hadoop Starter Kit course<\/a> to learn more about the block size in other filesytems.<\/p>\n

Why would you want to make the block size of specific dataset from 128 to 256 MB?<\/h2>\n

To answer this question, you need to understand what is the benefit of having a larger block size. A single HDFS block (64 MB or 128 MB or ..) will be written to disk sequentially. When you write the data sequentially there is a fair chance that the data will be written into contiguous space on disk which means that data will be written next to each other in a continuous fashion. When a data is laid out in the disk in continuous fashion it reduces the number of disk seeks during the read operation resulting in an efficient read. So that is why block size in HDFS is huge when compared to the other file systems.<\/p>\n

Now the question becomes should I make my dataset 128 MB or 256 MB or even more? It all depends on your cluster capacity and the size of your datasets. Lets say you have a dataset which is 2 Petabytes in size. Having a 64 MB block size for this dataset will result in 31 million+ blocks which would put stress on the NameNode to manage all that blocks. Having a lot of blocks will also result in a lot of mappers during MapReduce execution. So in this case you may decide to increase the block size just for that dataset.<\/p>\n

Try it out<\/h2>\n

Try the commands in our cluster. Click here to get FREE access to the cluster.<\/a>