How to read and write XML files with Spark?
March 30, 2023How to check size of a bucket in S3?
April 6, 2023Let’s say we have a table name employee_multiple_depts and each employee in the table is mapped to 3 departments – dept1, dept2 and dept3.
select * from employee_multiple_depts; +--------------------------------+--------------------------------+--------------------------------+--------------------------------+--+ | employee_multiple_depts.ename | employee_multiple_depts.dept1 | employee_multiple_depts.dept2 | employee_multiple_depts.dept3 | +--------------------------------+--------------------------------+--------------------------------+--------------------------------+--+ | Jerry | 10 | 20 | 30 | | Tom | 60 | 70 | 80 | +--------------------------------+--------------------------------+--------------------------------+--------------------------------+--+
What we need is to transpose or convert each department into a row for each employee.
Solution
We will first create an array with dept1, dept2 and dept3. We then explode the array. explode is an User Defined Table generating function (UDTF) which converts a row into multiple rows.
Finally we use LATERAL VIEW. LATERAL VIEW joins resulting output exploded rows to the input rows from employee_multiple_depts providing the below output.
Check out this post if you need more information on LATERAL VIEW
SELECT ename, dept FROM employee_multiple_depts LATERAL VIEW explode(array(dept1, dept2, dept3)) dept_exp as dept; +--------+-------+--+ | ename | dept | +--------+-------+--+ | Jerry | 10 | | Jerry | 20 | | Jerry | 30 | | Tom | 60 | | Tom | 70 | | Tom | 80 | +--------+-------+--+ 6 rows selected (0.164 seconds)
1 Comment
[…] The Big Data in Real World team does a bit of data reshaping: […]