How to properly remove a node from a Hadoop cluster?

How to specify conditional expressions (OR, AND, NOT) when searching documents in Elasticsearch?

April 14, 2021

What is the difference between reduceByKey and aggregateByKey in Spark?

April 19, 2021

Published by Big Data In Real World at April 16, 2021

When dfs.hosts.exclude is not set

Follow the below steps when dfs.hosts.exclude is not set in your cluster

Shutdown the Namenode
Edit hdfs-site.xml add an entry for dfs.hosts.exclude with the location of the file
Add the hostname to the file mentioned in dfs.hosts.exclude that you are planning to remove
Start namenode

When dfs.hosts.exclude is already set

Add the hostname to the file mentioned in dfs.hosts.exclude that you are planning to remove

After adding the hostname to the exclusion run the below command to exclude the node from functioning as a Datanode

hdfs dfsadmin -refreshNodes

Below command will exclude the node from functioning as a Node Manager

yarn rmadmin -refreshNodes

Why not just shut down the nodes?

Abruptly shutting down a node will cause the HDFS blocks stored in the nodes to be under replicated and upon shut down HDFS will start replicating the blocks from the available nodes to a new set of nodes to bring the replication to 3 (by default).

Abruptly shutting down a node will also cause the MapReduce and other jobs executing in the cluster to fail abruptly.

Removing the node by excluding as described above would first replicate the blocks in the node that is being removed to other nodes and stop talking new jobs and wait for the running jobs to complete execution. Hence this approach should be followed to safely remove the nodes from the cluster.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to properly remove a node from a Hadoop cluster?

How to specify conditional expressions (OR, AND, NOT) when searching documents in Elasticsearch?

What is the difference between reduceByKey and aggregateByKey in Spark?

How to specify conditional expressions (OR, AND, NOT) when searching documents in Elasticsearch?

What is the difference between reduceByKey and aggregateByKey in Spark?

When dfs.hosts.exclude is not set

When dfs.hosts.exclude is already set

Why not just shut down the nodes?

Big Data In Real World

Related posts

Sunset: Hadoop Developer In Real World cluster

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!