Fixing org.apache.hadoop.security.AccessControlException: Permission denied

One Of Several Explanations To “could only be replicated to 0 nodes” Error

February 17, 2014

Input For Page Ranking Using Hadoop

April 4, 2014

Published by Big Data In Real World at March 3, 2014

AccessControlException

org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode="staging":ubuntu:supergroup:rwxr-xr-x
Exception in thread "main" org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode="staging":ubuntu:supergroup:rwxr-xr-x
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1459)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:362)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at com.jerry.WordCount.main(WordCount.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

From the above exception it is easy to see that a job is trying to a create a directory using the username emily.ragland under a directory named staging . staging folder is owned by user ubuntu who belongs to group namedsupergroup . Since other users don’t have access to the staging folder( rwxr-xr-x ) writes under staging fails for emily.ragland

mapreduce.jobtracker.staging.root.dir

Lets understand what is the staging directory first. mapreduce.jobtracker.staging.root.dir property in mapred-site.xml specifies the location of the staging directory in HDFS. When a job is submitted, staging folder is used to store the files common to the job like the job’s jar. For this discussion, lets assume that user ubuntu is running all the daemons in the cluster. When mapreduce.jobtracker.staging.root.dir property is not specified the location of staging folder would be /tmp/hadoop-ubuntu/mapred/staging .

When a user submits a job, a folder named as the username will be created (if not already present) under /tmp/hadoop-ubuntu/mapred/staging. After few job executions the listing of the directory for user ubuntu will look like this.

ubuntu@ip-172-x-x-x:~$ hadoop fs -ls /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging

Found 6 items

drwx------ - ubuntu supergroup 0 2014-01-23 13:01 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0034
drwx------ - ubuntu supergroup 0 2014-02-01 12:57 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0097
drwx------ - ubuntu supergroup 0 2014-02-01 12:58 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0098
drwx------ - ubuntu supergroup 0 2014-02-08 13:52 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0127
drwx------ - ubuntu supergroup 0 2014-02-08 14:19 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0133
drwx------ - ubuntu supergroup 0 2014-02-08 14:32 /tmp/hadoop-ubuntu/mapred/staging/ubuntu/.staging/job_201401070051_0139

Now going back to the exception we can see when user emily.ragland tried executing a job, creating a directory under /tmp/hadoop-ubuntu/mapred/staging failed because emily.ragland does not have access to create folders under /tmp/hadoop-ubuntu/mapred/staging

To keep things clean and for better control lets specify the location of the staging directory by setting the mapreduce.jobtracker.staging.root.dir property in mapred-site.xml . After the property is set, restart mapred service for the property to take effect.

<property>
    <name>mapreduce.jobtracker.staging.root.dir</name>
    <value>/user</value>
</property>

After the property takes effect, when emily.ragland tries to run a job an attempt is made to create a folder like /user/emily.ragland in HDFS. The other reason I like to override with /user is because it is more aligned with the UNIX notion of a home folder. Now when emily.ragland runs a job it would still fail with the below exception. It still does not have access to /user folder and this is expected as we have not done anything to fix the permission issue.

org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode="/user":ubuntu:supergroup:rwxr-xr-x

I have seen several suggestions online suggesting to do a chmod on /user to 777. This is not advisable as doing so will give other users access to delete or modify other users files in HDFS. Instead create a folder named emily.ragland under /user using the root user (in our case it isubuntu ) in HDFS. After creating the folder, change the folder permissions to emily.ragland.

ubuntu@ip-172-x-x-x:~$ hadoop fs -mkdir /user/emily.ragland
ubuntu@ip-172-x-x-x:~$ hadoop fs -chown emily.ragland:emily.ragland /user/emily.ragland

Now the permissions are set the job should run under emily.ragland with out any issues.

ubuntu@ip-172-x-x-x:~$ hadoop fs -ls /user/emily.ragland
Found 3 items

drwx------ - emily.ragland emily.ragland 0 2014-03-03 13:02 /user/emily.ragland/.staging
drwxr-xr-x - emily.ragland emily.ragland 0 2014-03-03 13:02 /user/emily.ragland/RESULTS
-rw-r--r-- 1 emily.ragland emily.ragland 12 2014-02-26 12:49 /user/emily.ragland/emily.log

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Fixing org.apache.hadoop.security.AccessControlException: Permission denied

One Of Several Explanations To “could only be replicated to 0 nodes” Error

Input For Page Ranking Using Hadoop

One Of Several Explanations To “could only be replicated to 0 nodes” Error

Input For Page Ranking Using Hadoop

AccessControlException

mapreduce.jobtracker.staging.root.dir

Big Data In Real World

Related posts

Sunset: Hadoop Developer In Real World cluster

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!

Leave a Reply Cancel reply