There are 0 datanode(s) running and no node(s)
January 9, 2017FSNamesystem initialization failed
January 16, 2017Hadoop safemode recovery – taking too long!
Any time NameNode is restarted or started, NameNode first goes into maintenance state called Safe Mode. When NameNode is in safemode it does not allow any changes (writes) to the file system. Which also means during Safe Mode, HDFS cluster will be in read-only and NameNode does not even replicate or delete blocks.
Why Safemode?
NameNode persists it’s metadata in a file called FSIMAGE . FSIMAGE has the metadata of HDFS. For eg. it has information about files, folders, permissions, created, modified timestamp etc. It also has block composition of files. For eg. it stores fileabc is made up of blocks – block x, block y and block z. However it does not store the location of blocks x,y and z.
During NameNode startup, it receives block reports containing information about block locations from all data nodes. To leave Safemode, NameNode need to collect reports for at least a specific threshold percentage of blocks (as specified in dfs.namenode.safemode.threshold-pct). If NameNode prematurely comes out of Safemode without receiving enough block locations from datanodes, it might think that blocks are under replicated and start the replication which would be unnecessary and hence it waits for the threshold to reach. Once the threshold reaches, Namenode automatically comes out of the Namenode and start servicing HDFS clients.
Sometimes it might take a really long time for Namenode to come out of the Safemode. If you check the Namenode logs, you would see something like below. Here the log message reads that Namenode needs information about 7,183 blocks before it can safely come out of Safemode.
The reported blocks 319128 needs additional 7183 blocks to reach the threshold 0.9990 of total blocks 326638. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1711) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1691) at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:565) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960)
Why Namenode stays in Safemode for too long?
Sometimes Namenode stays in safemode for a very long time. Here are some of the possible reasons and how you could address them.
dfs.namenode.handler.count could be too low
dfs.namenode.handler.count in hdfs-site.xml specifies the number of threads used by Namenode for it’s processing. By default, it is set to 10. For a big cluster you need more threads to speed up the processing. So make sure the number is 10 or above for dfs.namenode.handler.count in a big cluster and if you are facing slow Namenode startup frequently consider increasing the number. You would have to test the right number that works for your cluster.
<property> <name>dfs.namenode.handler.count</name> <value>3</value> <final>true</final> </property>
Missing or Corrupt Blocks
If you have issues with lot of DataNodes, you would obviously miss lot of blocks and that will keep the NameNode in Safemode for a long time because it would miss a lot of blocks to reach the threshold. In such a case, make sure all DataNodes are up and connected to the NameNode.
The other reason for extended Safemode is if there are blocks that are corrupt and recovery is not possible. In case of corrupted blocks we have to delete the corrupted blocks.
In such scenarios forcefully come out of the Safemode by running the below command.
hadoop dfsadmin -safemode leave
Then use the fsck command to look at the blocks for all file in your cluster.
hdfs fsck /
Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so run the below command which ignores lines with nothing but dots and lines talking about replication.
hdfs fsck / | egrep -v '^\.+$' | grep -v replica
Once you find a file that is corrupt use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.
hdfs fsck /path/to/corrupt/file -locations -blocks -files
You can use the reported block numbers to go around to the datanodes and the namenode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, datanode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again. Do this until all files are healthy or you exhaust all alternatives looking for the blocks.
If you still can not determine the reason for corrupt or missing blocks you don’t have any other option other than removing the file from the system to make the HDFS healthy.
hdfs fs -rm /file-with-missing-corrupt blocks.