Apache Pig Tutorial – Grouping Records
December 19, 2015Apache Pig Tutorial – Executing as a Script
December 20, 2015Apache Pig Tutorial – Ordering Records
Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.
In the previous post we look at how to group records and we also found average volume of stocks from year 2003. In this post we will see how to order or sort records using Apache Pig.
First lets load, group and find the average volume of stocks symbol from year 2003.
grunt> stocks = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as (exchange:chararray, symbol:chararray, date:datetime, open:float, high:float, low:float, close:float, volume:int, adj_close:float); grunt> filter_by_yr = FILTER stocks by GetYear(date) == 2003; grunt> grp_by_sym = GROUP filter_by_yr BY symbol; grunt> avg_volume = FOREACH grp_by_sym GENERATE group, ROUND(AVG(filter_by_yr.volume)) as avgvolume;
Ordering Records
Use the ORDER operator to order the records. By default records are ordered in ascending order. Use DESC to order records in descending order.
grunt> avg_vol_ordered = ORDER avg_volume BY avgvolume DESC;
We can also choose to perform ordering with multiple columns . In the below instruction, the records will be ordered by symbol and the volume. In the below instruction group refers to the symbol column.
grunt> avg_vol_ordered = ORDER avg_volume BY group, avgvolume DESC;
Display Results
grunt> DUMP avg_vol_ordered;
See It In Action
Previous Lesson : Grouping Records
Next Lesson : Execute as a Script
2 Comments
[…] Previous Apache Pig Tutorial – Grouping Records […]
[…] Previous Lesson : Ordering Records […]