How to find the version of Hadoop and Hive?
October 18, 2021How to get DDL or create script of an existing Hive table?
October 25, 2021In this post we will explain the architecture of Hive along with the various components involved and their functions.
HiveServer2
HiveServer2 is an improved implementation of HiveServer1 and was introduced with Hive 0.11. HiveServer2 is responsible for the following functions.
- Thrift service to support concurrent client connections and sessions
- Support common ODBC and JDBC drivers
- Authentication support via Kerberos, LDAP and other pluggable implementations
- Authorization
- Query optimization and execution
HiveServer2 is a container for the Hive execution engine. For each client connection, it creates a new execution context that serves Hive SQL requests from the client.
Compiler and Execute Engine
When a client executes a Hive query it is sent to the compiler and Hive optimizes the query, creates a query plan and creates an execution plan and finally executes it against the data in HDFS.
Metastore database
Metastore database is not part of HiveServer2 (and it is not shown in the picture). Every Hive installation needs to have an RDBMS like Derby (good for dev environments only), Oracle or MySQL.
Hive stores the metadata of the tables and database that is managed by Hive in the metastore database. Note that this database doesn’t hold the actual data. The data will reside in HDFS.
Metastore
Metastore service runs inside Hiveserver2 and will communicate with the configured metastore database to look up the metadata information of the tables and database that is managed by Hive.
Hive clients
Hive CLI was deprecated and was replaced by Beeline to access Hive. Beeline connects to the HiveServer2 and acts as an interface or client for users to run queries and see results.
Hive also supports other clients using ODBC and JDBC to HiveServer2.