becustom
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114wordpress-seo
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114In this post we are going to see how to upload a file to HDFS overriding the default block size. In the older versions of Hadoop the default block size was 64 MB and in the newer versions the default block size is 128 MB.<\/span><\/p>\n Let’s assume that the default block size in your cluster is 128 MB. What if you want all the datasets in your HDFS to have the default block size of 128 MB but you would like one specific dataset to be stored with block size of 256 MB?<\/p>\n Refer to the “HDFS – Why Another Filesystem” chapter in the FREE Hadoop Starter Kit course<\/a> to learn more about the block size in other filesytems.<\/p>\n To answer this question, you need to understand what is the benefit of having a larger block size. A single HDFS block (64 MB or 128 MB or ..) will be written to disk sequentially. When you write the data sequentially there is a fair chance that the data will be written into contiguous space on disk which means that data will be written next to each other in a continuous fashion. When a data is laid out in the disk in continuous fashion it reduces the number of disk seeks during the read operation resulting in an efficient read. So that is why block size in HDFS is huge when compared to the other file systems.<\/p>\n Now the question becomes should I make my dataset 128 MB or 256 MB or even more? It all depends on your cluster capacity and the size of your datasets. Lets say you have a dataset which is 2 Petabytes in size. Having a 64 MB block size for this dataset will result in 31 million+ blocks which would put stress on the NameNode to manage all that blocks. Having a lot of blocks will also result in a lot of mappers during MapReduce execution. So in this case you may decide to increase the block size just for that dataset.<\/p>\n Try the commands in our cluster. Click here to get FREE access to the cluster.<\/a>Why would you want to make the block size of specific dataset from 128 to 256 MB?<\/h2>\n
Try it out<\/h2>\n