Apache Pig Tutorial – Tuple & Bag
December 31, 2015Hadoop Mapper and Reducer Output Type Mismatch
June 22, 2016Apache Pig Tutorial – Map
Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.
In the previous post, we saw 2 complex types – Tuple and Bag. In this post, we will see another complex type in Pig – Map.
Sample Data
Take a look at couple of records from Department dataset. The first column has the department number, second column has department name. Third column has the address. But the structure of it looks weird doesn’t it? It is a Map.
328;ADMIN HEARNG;[street#939 W El Camino,city#Chicago,state#IL] 43;ANIMAL CONTRL;[street#415 N Mary Ave,city#Chicago,state#IL]
When you see a square bracket, we can infer it is a Map. Map is nothing but a key value pair. Above records have 3 key value pairs – street, city and state.
Load & Project a Map
Now we know how to spot a Map. Let’s see how we can load, define & project a map.
grunt> departments = LOAD '/user/hirw/input/employee-pig/department_dataset_chicago' using PigStorage(';') AS (dept_id:int, dept_name:chararray, address:map[]); grunt> dept_addr = FOREACH departments GENERATE dept_name, address#'street' as street, address#'city' as city, address#'state' as state;
Loading is easy, for the type simply say map[]. Address is a Map with key value pairs. To project the value for street key from the address column, you can say address#’street’. Similarly for city you can say address#’city’.
Display Results
grunt> top100 = LIMIT dept_addr 100; grunt> DUMP top100;
See It In Action
Previous Lesson : Tuple & Bag