What is the difference between Hive internal tables and external tables?

Published by Big Data In Real World at January 20, 2021

Internal tables

All metadata information of internal tables is managed by Hive

When an internal table is dropped, Hive will also drop the data relevant to the table.

External tables

Like internal tables, all metadata information of external tables are managed by Hive.

Unlike internal tables, when an external table is dropped, Hive will not drop the data relevant to the table.

When to use an internal table and when to use an external table?

A good use case to use an internal table is when you are using Hive to hold some intermediate data. In that case, when you drop the table you also want the data behind the table to be dropped.

Internal tables also make sense when you drop and recreate tables in Hive quite a lot. In that case you may not want to keep accumulating data.

In most cases, external tables make sense. In most real world scenarios your Hive table is probably fed by external processes like Spark jobs and consumed by applications outside Hive. In such instances Hive is used merely to hold the metadata and data is actually managed by processes outside of Hive so it makes sense to keep the data intact when we drop the Hive table.

To make this simple, when in doubt, always create an external table.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between Hive internal tables and external tables?

Big Data In Real World

How to check size of a directory in HDFS?

How does Shuffle Sort Merge Join work in Spark?

How to check size of a directory in HDFS?

How does Shuffle Sort Merge Join work in Spark?

What is the difference between Hive internal tables and external tables?

Internal tables

External tables

When to use an internal table and when to use an external table?

Big Data In Real World

Related posts

How to transpose or convert columns to rows in Hive?

How to fail a Hive script based on a condition?

How to delete duplicate data from the Hive table?

How to check size of a directory in HDFS?

How does Shuffle Sort Merge Join work in Spark?

How to check size of a directory in HDFS?

How does Shuffle Sort Merge Join work in Spark?