How to fix Incompatible clusterIDs error during DataNode startup?
September 1, 2021What do the numbers on the progress bar mean in Spark shell or Spark UI?
September 6, 2021LATERAL VIEW is a very powerful concept in Hive. It is used when we have to work with data with complex types. Let’s see this with an example.
Problem
Let’s say we have an employee table with employee name and an Array of department ids the employee belongs to.
select * from employee;
+-----------------+---------------------+--+ | employee.ename | employee.dept_list | +-----------------+---------------------+--+ | Tom | [20] | | Jerry | [10,20] | | Riley | [20,30,40] | +-----------------+---------------------+--+
With the data above we would like to count the number of departments the employee belongs to. The output should look like below.
+--------+------+--+ | ename | _c1 | +--------+------+--+ | Jerry | 2 | | Riley | 3 | | Tom | 1 | +--------+------+--+
This is an easy problem to solve if the data we have is not nested. But with our data the dept_list is an Array of integers providing the list of departments the employee belongs to.
Solution
Lateral view is used in conjunction with user-defined table generating functions such as explode(). A UDTF generates zero or more output rows for each input row.
Click here if you like to know the difference between UDF, UDAF and UDTF
A lateral view first applies the UDTF to each row of the base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.
LATERAL VIEW can’t function alone. It needs to be used along with a UDTF. Here we are using explode() to first explode the array to individual rows. For the exploded data we are naming the table as depts with a column dept_id.
LATERAL VIEW joins resulting output exploded rows to the input rows from employee providing the below output.
SELECT ename, dept_id FROM employee LATERAL VIEW explode(dept_list) depts AS dept_id; +--------+----------+--+ | ename | dept_id | +--------+----------+--+ | Tom | 20 | | Jerry | 10 | | Jerry | 20 | | Riley | 20 | | Riley | 30 | | Riley | 40 | +--------+----------+--+ 6 rows selected (0.16 seconds)
We can enhance the above query to group by ename and count on the dept_id to get the number of departments an employee belongs to.
SELECT ename, count(dept_id) FROM employee LATERAL VIEW explode(dept_list) depts AS dept_id GROUP BY ename; +--------+------+--+ | ename | _c1 | +--------+------+--+ | Jerry | 2 | | Riley | 3 | | Tom | 1 | +--------+------+--+ 3 rows selected (30.312 seconds)
1 Comment
[…] Check out this post if you need more information on LATERAL VIEW […]