How to calculate the difference between 2 dates in Hive?
October 11, 2021How to find the version of Hadoop and Hive?
October 18, 2021LATERAL VIEW and EXPLODE are 2 different things in Hive. Lateral view is used in conjunction with user-defined table generating functions such as explode().
Problem
Let’s say we have an employee table with employee name and an Array of department ids the employee belongs to.
select * from employee;
+-----------------+---------------------+--+ | employee.ename | employee.dept_list | +-----------------+---------------------+--+ | Tom | [20] | | Jerry | [10,20] | | Riley | [20,30,40] | +-----------------+---------------------+--+
With the data above we would like to count the number of departments the employee belongs to. The output should look like below.
+--------+------+--+ | ename | _c1 | +--------+------+--+ | Jerry | 2 | | Riley | 3 | | Tom | 1 | +--------+------+--+
This is an easy problem to solve if the data we have is not nested. But with our data the dept_list is an Array of integers providing the list of departments the employee belongs to.
Solution
Lateral view is used in conjunction with user-defined table generating functions such as explode(). A UDTF generates zero or more output rows for each input row.
Click here if you like to know the difference between UDF, UDAF and UDTF
A lateral view first applies the UDTF to each row of the base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.
LATERAL VIEW can’t function alone. It needs to be used along with a UDTF. Here we are using explode() to first explode the array to individual rows. For the exploded data we are naming the table as depts with a column dept_id.
LATERAL VIEW joins resulting output exploded rows to the input rows from employee providing the below output.
SELECT ename, dept_id FROM employee LATERAL VIEW explode(dept_list) depts AS dept_id; +--------+----------+--+ | ename | dept_id | +--------+----------+--+ | Tom | 20 | | Jerry | 10 | | Jerry | 20 | | Riley | 20 | | Riley | 30 | | Riley | 40 | +--------+----------+--+ 6 rows selected (0.16 seconds)
We can enhance the above query to group by ename and count on the dept_id to get the number of departments an employee belongs to.
SELECT ename, count(dept_id) FROM employee LATERAL VIEW explode(dept_list) depts AS dept_id GROUP BY ename; +--------+------+--+ | ename | _c1 | +--------+------+--+ | Jerry | 2 | | Riley | 3 | | Tom | 1 | +--------+------+--+ 3 rows selected (30.312 seconds)
1 Comment
[…] + Read More Here […]