Stream Data From Twitter To Analyze Using Hadoop
January 15, 2014Fixing java.io.IOException: Incompatible namespaceIDs
February 2, 2014In this post we are going to troubleshoot a failing Hive action in Oozie. The Hive action we have is very simple, it tries to add a partition in to a table in Hive. We will go through the logs to analyze the problem. We will assume we are executing the actions in a Hadoop cluster.
Environment
Oozie – 4.0.0
Hive – 0.11.0
Hadoop – 1.2.1
Hive metastore DB – Embedded derby (note: It is advisable to run metastore on a database like mysql)
Analyze with Oozie commands
Start the Oozie job. Oozie will return the Job Id it just started. Take a note of it.
ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -config ~/oozie-workflows/job.properties -run job: 0000092-140107010153873-oozie-ubun-C
Query the status of the job using the Job ID noted above. As we can see, the actions are getting killed but there are no error messages displayed at this level. So we have to drill down further.
ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -info 0000092-140107010153873-oozie-ubun-C Job ID : 0000092-140107010153873-oozie-ubun-C ------------------------------------------------------------------------------------------------------------------------------------ Job Name : add-partition-coord App Path : hdfs://ec2-x-x-x.compute-1.amazonaws.com:9000/user/ubuntu/oozie-workflows/coord-app.xml Status : RUNNING Start Time : 2014-01-26 00:05 GMT End Time : 2014-01-26 00:35 GMT Pause Time : - Concurrency : 1 ------------------------------------------------------------------------------------------------------------------------------------ ID Status Ext ID Err Code Created Nominal Time 0000092-140107010153873-oozie-ubun-C@1 KILLED 0000093-140107010153873-oozie-ubun-W - 2014-01-26 12:41 GMT 2014-01-26 00:05 GMT ------------------------------------------------------------------------------------------------------------------------------------ 0000092-140107010153873-oozie-ubun-C@2 KILLED 0000094-140107010153873-oozie-ubun-W - 2014-01-26 12:41 GMT 2014-01-26 00:15 GMT ------------------------------------------------------------------------------------------------------------------------------------ 0000092-140107010153873-oozie-ubun-C@3 RUNNING 0000095-140107010153873-oozie-ubun-W - 2014-01-26 12:41 GMT 2014-01-26 00:25 GMT ------------------------------------------------------------------------------------------------------------------------------------
Copy the Ext ID of one of the killed instances. Query the info on that Ext ID. From below we know it is failing on hive-add-partition action.
ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -info 0000094-140107010153873-oozie-ubun-W Job ID : 0000094-140107010153873-oozie-ubun-W ------------------------------------------------------------------------------------------------------------------------------------ Workflow Name : hive-add-partition-wf App Path : hdfs://ec2-x-x-x.compute-1.amazonaws.com:9000/user/ubuntu/oozie-workflows/hive-action.xml Status : KILLED Run : 0 User : ubuntu Group : - Created : 2014-01-26 12:43 GMT Started : 2014-01-26 12:43 GMT Last Modified : 2014-01-26 12:44 GMT Ended : 2014-01-26 12:44 GMT CoordAction ID: 0000092-140107010153873-oozie-ubun-C@2 Actions ------------------------------------------------------------------------------------------------------------------------------------ ID Status Ext ID Ext Status Err Code ------------------------------------------------------------------------------------------------------------------------------------ 0000094-140107010153873-oozie-ubun-W@:start: OK - OK - ------------------------------------------------------------------------------------------------------------------------------------ 0000094-140107010153873-oozie-ubun-W@hive-add-partition ERROR job_201401070051_0092 FAILED/KILLED10001 ------------------------------------------------------------------------------------------------------------------------------------ 0000094-140107010153873-oozie-ubun-W@fail OK - OK E0729 ------------------------------------------------------------------------------------------------------------------------------------
Now we have the Err Status and Err Code (FAILED/KILLED10001) but this does not explain much. Lets drill down further.
ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -info 0000094-140107010153873-oozie-ubun-W@hive-add-partition ID : 0000094-140107010153873-oozie-ubun-W@hive-add-partition ------------------------------------------------------------------------------------------------------------------------------------ Console URL : <a href="http://ec2-x-x-x.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201401070051_0092">http://ec2-x-x-x.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201401070051_0092</a> Error Code : 10001 Error Message : Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001] External ID : job_201401070051_0092 External Status : FAILED/KILLED Name : hive-add-partition Retries : 0 Tracker URI : ec2-x-x-x.compute-1.amazonaws.com:54311 Type : hive Started : 2014-01-26 12:43 GMT Status : ERROR Ended : 2014-01-26 12:44 GMT ------------------------------------------------------------------------------------------------------------------------------------
Now we have the Error Message . Error Message : Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]. Tells us where it has failed but not the reason. We need to check the Hive logs to find out the reason.
Checking Oozie Logs
When the Oozie job is executed against a Hadoop cluster, the actions could run on one of the data nodes. Note the Console URL from the above info listing from quering the ID using the Oozie command. The URL points to the Job Tracker information in Hadoop for the corresponding job we are investigating. Using the information from the Task Tracker we can find the datanode the action was executed.
Go to the location of Hive log in the datanode. By default, the hive.log will be in /tmp/<user> folder. For me, the location is /tmp/ubuntu/hive.log.
Here is the error I see in the log.
2014-01-26 12:45:17,035 WARN hive.metastore (HiveMetaStoreClient.java:open(285)) - Failed to connect to the MetaStore Server... 2014-01-26 12:45:18,037 WARN hive.metastore (HiveMetaStoreClient.java:open(285)) - Failed to connect to the MetaStore Server... 2014-01-26 12:45:19,067 WARN hive.metastore (HiveMetaStoreClient.java:open(285)) - Failed to connect to the MetaStore Server... 2014-01-26 12:45:20,228 ERROR ql.Driver (SessionState.java:printError(401)) - FAILED: SemanticException [Error 10001]: Table not found tweets org.apache.hadoop.hive.ql.parse.SemanticException: Table not found tweets at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:2275) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:319) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:445) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:455) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:711) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:312) at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:270) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37) at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method)
The issue is that from where the hive action is executed (i.e. the datanode), it can not connect to the Hive metastore. Because of this error the action cannot find the table in Hive. Hive metastore, in my case is running on a different node than the datanode (which is usually the case). The action should know where it can find the Hive metastore db and how to connect to the metastore.
Hive action refers to hive-site.xml for connection properties. Note: Hive ignore hive-site.xml file and it is not provided with the binaries but it is essential for Hive action. So we need to create hive-site.xml.
Here is my hive-site.xml and it is available in the Oozie’s workflow path in HDFS.
Fixing the problem
<?xml version="1.0" encoding="UTF-8"?> <configuration> ….. <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.apache.derby.jdbc.EmbeddedDriver</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://ec2-x-x-x.compute-1.amazonaws.com:10000</value> <description>IP address (or fully-qualified domain name) and port of the metastore host</description> </property> ….. </configuration>
hive.metastore.uris has the location of the remote metastore. In our case we are connecting to the metastore running remotely using Thrift. The configurations are correct but I realize the HiveServer was not running. HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache Thrift.
Start the HiveSerer using the below command on the node where hive is configured.
hive --service hiveserver
Oozie job now completes successfully once the HiveServer is running.
ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -info 0000096-140107010153873-oozie-ubun-C Job ID : 0000096-140107010153873-oozie-ubun-C ------------------------------------------------------------------------------------------------------------------------------------ Job Name : add-partition-coord App Path : hdfs://ec2-x-x-x.compute-1.amazonaws.com:9000/user/ubuntu/oozie-workflows/coord-app.xml Status : RUNNING Start Time : 2014-01-26 00:05 GMT End Time : 2014-01-26 00:35 GMT Pause Time : - Concurrency : 1 ------------------------------------------------------------------------------------------------------------------------------------ ID Status Ext ID Err Code Created Nominal Time 0000096-140107010153873-oozie-ubun-C@1 SUCCEEDED 0000097-140107010153873-oozie-ubun-W - 2014-01-26 12:51 GMT 2014-01-26 00:05 GMT ------------------------------------------------------------------------------------------------------------------------------------ 0000096-140107010153873-oozie-ubun-C@2 SUCCEEDED 0000098-140107010153873-oozie-ubun-W - 2014-01-26 12:51 GMT 2014-01-26 00:15 GMT ------------------------------------------------------------------------------------------------------------------------------------ 0000096-140107010153873-oozie-ubun-C@3 RUNNING 0000099-140107010153873-oozie-ubun-W - 2014-01-26 12:51 GMT 2014-01-26 00:25 GMT ------------------------------------------------------------------------------------------------------------------------------------