becustom
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114wordpress-seo
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114Executions in Hadoop use the underlying logged in username to figure out the permissions in the cluster. When running jobs or working with HDFS, the user who started the Hadoop daemons in the cluster won’t have any access issues because the user has all the necessary permissions as it owns the folders in HDFS. We are most likely to hit the below\u00a0AccessControlException with a user other than the user running the daemons in the cluster and when the permissions are not configured correctly for the user running the jobs or operating HDFS.<\/p>\n
org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode=\"staging\":ubuntu:supergroup:rwxr-xr-x\r\nException in thread \"main\" org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode=\"staging\":ubuntu:supergroup:rwxr-xr-x\r\nat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\r\nat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)\r\nat sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)\r\nat java.lang.reflect.Constructor.newInstance(Constructor.java:526)\r\nat org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)\r\nat org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)\r\nat org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1459)\r\nat org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:362)\r\nat org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)\r\nat org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)\r\nat org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)\r\nat java.security.AccessController.doPrivileged(Native Method)\r\nat javax.security.auth.Subject.doAs(Subject.java:415)\r\nat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)\r\nat org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)\r\nat org.apache.hadoop.mapreduce.Job.submit(Job.java:550)\r\nat org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)\r\nat com.jerry.WordCount.main(WordCount.java:61)\r\nat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\r\nat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\r\nat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\nat java.lang.reflect.Method.invoke(Method.java:606)\r\nat org.apache.hadoop.util.RunJar.main(RunJar.java:160)<\/pre>\nFrom the above exception it is easy to see that a job is trying to a create a directory using the username emily.ragland<\/span>\u00a0under a directory named\u00a0staging<\/span>\u00a0. staging <\/span>\u00a0folder is owned by user\u00a0ubuntu <\/span>\u00a0who belongs to group namedsupergroup<\/span>\u00a0. Since other users don’t have access to the staging <\/span>\u00a0folder(\u00a0rwxr-xr-x<\/span>\u00a0) writes under staging <\/span>\u00a0fails for\u00a0emily.ragland<\/span>\u00a0<\/span><\/p>\n
mapreduce.jobtracker.staging.root.dir<\/h2>\n
Lets understand what is the staging directory first. mapreduce.jobtracker.staging.root.dir<\/span>\u00a0\u00a0property in mapred-site.xml <\/span>\u00a0specifies the location of the staging directory in HDFS. When a job is submitted, staging folder is used to store the files common to the job like the job’s jar.\u00a0For this discussion, lets assume that user ubuntu <\/span>\u00a0is running all the daemons in the cluster. When mapreduce.jobtracker.staging.root.dir<\/span>\u00a0\u00a0property is not specified the location of staging folder would be \/tmp\/hadoop-ubuntu\/mapred\/staging<\/span>\u00a0.<\/span><\/p>\n
When a user submits a job, a folder named as the username will be created (if not already present) under\u00a0\/tmp\/hadoop-ubuntu\/mapred\/staging. After few job executions the listing of the directory for user ubuntu <\/span>\u00a0will look like this.<\/p>\n
ubuntu@ip-172-x-x-x:~$ hadoop fs -ls \/tmp\/hadoop-ubuntu\/mapred\/staging\/ubuntu\/.staging\r\n\r\nFound 6 items\r\n\r\ndrwx------ - ubuntu supergroup 0 2014-01-23 13:01 \/tmp\/hadoop-ubuntu\/mapred\/staging\/ubuntu\/.staging\/job_201401070051_0034\r\ndrwx------ - ubuntu supergroup 0 2014-02-01 12:57 \/tmp\/hadoop-ubuntu\/mapred\/staging\/ubuntu\/.staging\/job_201401070051_0097\r\ndrwx------ - ubuntu supergroup 0 2014-02-01 12:58 \/tmp\/hadoop-ubuntu\/mapred\/staging\/ubuntu\/.staging\/job_201401070051_0098\r\ndrwx------ - ubuntu supergroup 0 2014-02-08 13:52 \/tmp\/hadoop-ubuntu\/mapred\/staging\/ubuntu\/.staging\/job_201401070051_0127\r\ndrwx------ - ubuntu supergroup 0 2014-02-08 14:19 \/tmp\/hadoop-ubuntu\/mapred\/staging\/ubuntu\/.staging\/job_201401070051_0133\r\ndrwx------ - ubuntu supergroup 0 2014-02-08 14:32 \/tmp\/hadoop-ubuntu\/mapred\/staging\/ubuntu\/.staging\/job_201401070051_0139<\/pre>\nNow going back to the exception we can see when user emily.ragland<\/span>\u00a0\u00a0tried executing a job, creating a directory under \u00a0\/tmp\/hadoop-ubuntu\/mapred\/staging<\/span>\u00a0 failed because emily.ragland<\/span>\u00a0\u00a0does not have access to create folders under\u00a0\/tmp\/hadoop-ubuntu\/mapred\/staging<\/span>\u00a0<\/span><\/p>\n
To keep things clean and for better control lets specify the location of the staging directory by setting the mapreduce.jobtracker.staging.root.dir<\/span>\u00a0\u00a0property in mapred-site.xml<\/span>\u00a0. After the property is set, restart mapred service for the property to take effect.<\/p>\n
<property>\r\n <name>mapreduce.jobtracker.staging.root.dir<\/name>\r\n <value>\/user<\/value>\r\n<\/property><\/pre>\nAfter the property takes effect, when emily.ragland<\/span>\u00a0\u00a0tries to run a job an attempt is made to create a folder like \/user\/emily.ragland <\/span>\u00a0in HDFS. The other reason I like to override with \/user<\/span>\u00a0\u00a0is because it is more aligned with the UNIX notion of a home folder. Now when emily.ragland runs a job it would still fail with the below exception. It still does not have access to \/user folder and this is expected as we have not done anything to fix the permission issue.<\/p>\n
org.apache.hadoop.security.AccessControlException: Permission denied: user=emily.ragland, access=WRITE, inode=\"\/user\":ubuntu:supergroup:rwxr-xr-x<\/pre>\nI have seen several suggestions online suggesting to do a chmod <\/span>\u00a0on \/user to 777. This is not advisable as doing so will give other users access to delete or modify other users files in HDFS. Instead create a folder named emily.ragland<\/span>\u00a0\u00a0under \/user<\/span>\u00a0\u00a0using the root user (in our case it isubuntu<\/span>\u00a0) in HDFS. After creating the folder, change the folder permissions to emily.ragland.<\/p>\n
ubuntu@ip-172-x-x-x:~$ hadoop fs -mkdir \/user\/emily.ragland\r\nubuntu@ip-172-x-x-x:~$ hadoop fs -chown emily.ragland:emily.ragland\u00a0\/user\/emily.ragland<\/pre>\nNow the permissions are set the job should run under emily.ragland<\/span>\u00a0\u00a0with out any issues.<\/p>\n
ubuntu@ip-172-x-x-x:~$ hadoop fs -ls \/user\/emily.ragland\r\nFound 3 items\r\n\r\ndrwx------ - emily.ragland emily.ragland 0 2014-03-03 13:02 \/user\/emily.ragland\/.staging\r\ndrwxr-xr-x - emily.ragland emily.ragland 0 2014-03-03 13:02 \/user\/emily.ragland\/RESULTS\r\n-rw-r--r-- 1 emily.ragland emily.ragland 12 2014-02-26 12:49 \/user\/emily.ragland\/emily.log<\/pre>\n<\/p>\n","protected":false},"excerpt":{"rendered":"
Executions in Hadoop use the underlying logged in username to figure out the permissions in the cluster. When running jobs or working with HDFS, the user [\u2026]<\/span><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-185","post","type-post","status-publish","format-standard","hentry","category-hadoop"],"yoast_head":"\n
Fixing org.apache.hadoop.security.AccessControlException: Permission denied - Big Data In Real World<\/title>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\t\n\t\n\t\n