What is the difference between explode and posexplode functions in Hive? - Big Data In Real World

What is the difference between explode and posexplode functions in Hive?

How to change or reset consumer offset in Kafka?
May 12, 2021
How to deal with corrupt files in HDFS?
May 17, 2021
How to change or reset consumer offset in Kafka?
May 12, 2021
How to deal with corrupt files in HDFS?
May 17, 2021

Both explode and posexplode are User Defined Table generating Functions. UDTFs operate on single rows and produce multiple rows as output.

explode()

There are 2 flavors of explode, one flavor takes an Array and another takes a Map.

In the below example explode function will take in an Array and explode the array into multiple rows. So if we have 3 elements in the array we will end up with 3 rows.

0: jdbc:hive2://ms2.hirw.com:2181,wk1.hirw.co> select array('Tom','Jerry','Emily');
+--------------------------+--+
|           _c0            |
+--------------------------+--+
| ["Tom","Jerry","Emily"]  |
+--------------------------+--+
1 row selected (0.631 seconds)
0: jdbc:hive2://ms2.hirw.com:2181,wk1.hirw.co> select explode(array('Tom','Jerry','Emily'));
+--------+--+
|  col   |
+--------+--+
| Tom    |
| Jerry  |
| Emily  |
+--------+--+
3 rows selected (0.774 seconds)

Let’s also see the explode function which takes in a Map instead of an Array.

Maps are key value pairs. Here we have a Map with 3 elements with key as name and value as age. 

0: jdbc:hive2://ms2.hirw.com:2181,wk1.hirw.co> select map('Tom',10,'Jerry',20,'Emily',30);
+-----------------------------------+--+
|                _c0                |
+-----------------------------------+--+
| {"Tom":10,"Jerry":20,"Emily":30}  |
+-----------------------------------+--+
1 row selected (0.18 seconds)

Let’s apply explode()  on the map. The map has 3 key value pairs so the explode function resulted in 3 rows.

0: jdbc:hive2://ms2.hirw.com:2181,wk1.hirw.co> select explode(map('Tom',10,'Jerry',20,'Emily',30)) as (name, age);
+--------+------+--+
|  name  | age  |
+--------+------+--+
| Tom    | 10   |
| Jerry  | 20   |
| Emily  | 30   |
+--------+------+--+
3 rows selected (0.226 seconds)

posexplode()

Just like explode on array, posexplode also operates on arrays. Posexplode will take in an Array and explode the array into multiple rows and along with the elements in the array it will also give us the position of the element in the array.

Let’s see this with an example. As you can see, in addition to exploding the elements in the array the output also has the position of the element in the array.

0: jdbc:hive2://ms2.hirw.com:2181,wk1.hirw.co> select posexplode(array('Tom','Jerry','Emily')) as (pos, name);
+------+--------+--+
| pos  |  name  |
+------+--------+--+
| 0    | Tom    |
| 1    | Jerry  |
| 2    | Emily  |
+------+--------+--+
3 rows selected (0.438 seconds)

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between explode and posexplode functions in Hive?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X