Troubleshooting Hive Action In Oozie - Big Data In Real World

Troubleshooting Hive Action In Oozie

Stream Data From Twitter To Analyze Using Hadoop
January 15, 2014
Fixing java.io.IOException: Incompatible namespaceIDs
February 2, 2014
Stream Data From Twitter To Analyze Using Hadoop
January 15, 2014
Fixing java.io.IOException: Incompatible namespaceIDs
February 2, 2014

In this post we are going to troubleshoot a failing Hive action in Oozie. The Hive action we have is very simple, it tries to add a partition in to a table in Hive. We will go through the logs to analyze the problem. We will assume we are executing the actions in a Hadoop cluster.

 

Environment

Oozie – 4.0.0

Hive – 0.11.0

Hadoop – 1.2.1

Hive metastore DB  – Embedded derby (note: It is advisable to run metastore on a database like mysql)

 

Analyze with Oozie commands

 

Start the Oozie job. Oozie will return the Job Id it just started. Take a note of it.

ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -config ~/oozie-workflows/job.properties -run
job: 0000092-140107010153873-oozie-ubun-C


Query the status of the job using the Job ID noted above. As we can see, the actions are getting killed but there are no error messages displayed at this level. So we have to drill down further.

ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -info 0000092-140107010153873-oozie-ubun-C
Job ID : 0000092-140107010153873-oozie-ubun-C
------------------------------------------------------------------------------------------------------------------------------------
Job Name    : add-partition-coord
App Path    : hdfs://ec2-x-x-x.compute-1.amazonaws.com:9000/user/ubuntu/oozie-workflows/coord-app.xml
Status      : RUNNING
Start Time  : 2014-01-26 00:05 GMT
End Time    : 2014-01-26 00:35 GMT
Pause Time  : -
Concurrency : 1
------------------------------------------------------------------------------------------------------------------------------------
ID                                         Status    Ext ID                               Err Code  Created              Nominal Time
0000092-140107010153873-oozie-ubun-C@1     KILLED    0000093-140107010153873-oozie-ubun-W -         2014-01-26 12:41 GMT 2014-01-26 00:05 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000092-140107010153873-oozie-ubun-C@2     KILLED    0000094-140107010153873-oozie-ubun-W -         2014-01-26 12:41 GMT 2014-01-26 00:15 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000092-140107010153873-oozie-ubun-C@3     RUNNING   0000095-140107010153873-oozie-ubun-W -         2014-01-26 12:41 GMT 2014-01-26 00:25 GMT
------------------------------------------------------------------------------------------------------------------------------------


Copy the Ext ID of one of the killed instances. Query the info on that Ext ID. From below we know it is failing on hive-add-partition action.

ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -info 0000094-140107010153873-oozie-ubun-W
Job ID : 0000094-140107010153873-oozie-ubun-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : hive-add-partition-wf
App Path      : hdfs://ec2-x-x-x.compute-1.amazonaws.com:9000/user/ubuntu/oozie-workflows/hive-action.xml
Status        : KILLED
Run           : 0
User          : ubuntu
Group         : -
Created       : 2014-01-26 12:43 GMT
Started       : 2014-01-26 12:43 GMT
Last Modified : 2014-01-26 12:44 GMT
Ended         : 2014-01-26 12:44 GMT
CoordAction ID: 0000092-140107010153873-oozie-ubun-C@2

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID                                                                            Status    Ext ID                 Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000094-140107010153873-oozie-ubun-W@:start:                                  OK        -                      OK         -
------------------------------------------------------------------------------------------------------------------------------------
0000094-140107010153873-oozie-ubun-W@hive-add-partition                       ERROR     job_201401070051_0092  FAILED/KILLED10001
------------------------------------------------------------------------------------------------------------------------------------
0000094-140107010153873-oozie-ubun-W@fail                                     OK        -                      OK         E0729
------------------------------------------------------------------------------------------------------------------------------------

 

Now we have the Err Status and Err Code  (FAILED/KILLED10001) but this does not explain much. Lets drill down further.

ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -info 0000094-140107010153873-oozie-ubun-W@hive-add-partition
ID : 0000094-140107010153873-oozie-ubun-W@hive-add-partition
------------------------------------------------------------------------------------------------------------------------------------
Console URL       : <a href="http://ec2-x-x-x.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201401070051_0092">http://ec2-x-x-x.compute-1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201401070051_0092</a>
Error Code        : 10001
Error Message     : Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
External ID       : job_201401070051_0092
External Status   : FAILED/KILLED
Name              : hive-add-partition
Retries           : 0
Tracker URI       : ec2-x-x-x.compute-1.amazonaws.com:54311
Type              : hive
Started           : 2014-01-26 12:43 GMT
Status            : ERROR
Ended             : 2014-01-26 12:44 GMT
------------------------------------------------------------------------------------------------------------------------------------

Now we have the Error Message . Error Message     : Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]. Tells us where it has failed but not the reason. We need to check the Hive logs to find out the reason.

 

Checking Oozie Logs

When the Oozie job is executed against a Hadoop cluster, the actions could run on one of the data nodes. Note the Console URL from the above info listing from quering the ID using the Oozie command. The URL points to the Job Tracker information in Hadoop for the corresponding job we are investigating. Using the information from the Task Tracker we can find the datanode the action was executed.

Go to the location of Hive log in the datanode. By default, the hive.log will be in /tmp/<user> folder. For me, the location is /tmp/ubuntu/hive.log.

Here is the error I see in the log.

2014-01-26 12:45:17,035 WARN  hive.metastore (HiveMetaStoreClient.java:open(285)) - Failed to connect to the MetaStore Server...
2014-01-26 12:45:18,037 WARN  hive.metastore (HiveMetaStoreClient.java:open(285)) - Failed to connect to the MetaStore Server...
2014-01-26 12:45:19,067 WARN  hive.metastore (HiveMetaStoreClient.java:open(285)) - Failed to connect to the MetaStore Server...
2014-01-26 12:45:20,228 ERROR ql.Driver (SessionState.java:printError(401)) - FAILED: SemanticException [Error 10001]: Table not found tweets
org.apache.hadoop.hive.ql.parse.SemanticException: Table not found tweets
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:2275)
at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:319)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:445)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:455)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:711)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:312)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:270)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)

The issue is that from where the hive action is executed (i.e. the datanode), it can not connect to the Hive metastore. Because of this error the action cannot find the table in Hive. Hive metastore, in my case is running on a different node than the datanode (which is usually the case). The action should know where it can find the Hive metastore db and how to connect to the metastore.

Hive action refers to hive-site.xml for connection properties. Note: Hive ignore hive-site.xml file and it is not provided with the binaries but it is essential for Hive action. So we need to create hive-site.xml.

Here is my hive-site.xml and it is available in the Oozie’s workflow path in HDFS.

 

Fixing the problem

<?xml version="1.0" encoding="UTF-8"?>

<configuration>
…..

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
</property>

<property>
<name>hive.metastore.uris</name>
<value>thrift://ec2-x-x-x.compute-1.amazonaws.com:10000</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

…..

</configuration>

hive.metastore.uris has the location of the remote metastore. In our case we are connecting to the metastore running remotely using Thrift. The configurations are correct but I realize the HiveServer was not running. HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache Thrift.

Start the HiveSerer using the below command on the node where hive is configured.

hive --service hiveserver


Oozie job now completes successfully once the HiveServer is running.

ubuntu@ip-172-x-x-x:~$ oozie job -oozie <a href="http://localhost:11000/oozie">http://localhost:11000/oozie</a> -info 0000096-140107010153873-oozie-ubun-C
Job ID : 0000096-140107010153873-oozie-ubun-C
------------------------------------------------------------------------------------------------------------------------------------
Job Name    : add-partition-coord
App Path    : hdfs://ec2-x-x-x.compute-1.amazonaws.com:9000/user/ubuntu/oozie-workflows/coord-app.xml
Status      : RUNNING
Start Time  : 2014-01-26 00:05 GMT
End Time    : 2014-01-26 00:35 GMT
Pause Time  : -
Concurrency : 1
------------------------------------------------------------------------------------------------------------------------------------
ID                                         Status    Ext ID                               Err Code  Created              Nominal Time
0000096-140107010153873-oozie-ubun-C@1     SUCCEEDED 0000097-140107010153873-oozie-ubun-W -         2014-01-26 12:51 GMT 2014-01-26 00:05 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000096-140107010153873-oozie-ubun-C@2     SUCCEEDED 0000098-140107010153873-oozie-ubun-W -         2014-01-26 12:51 GMT 2014-01-26 00:15 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000096-140107010153873-oozie-ubun-C@3     RUNNING   0000099-140107010153873-oozie-ubun-W -         2014-01-26 12:51 GMT 2014-01-26 00:25 GMT
------------------------------------------------------------------------------------------------------------------------------------
Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Leave a Reply

Troubleshooting Hive Action In Oozie
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X