Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the becustom domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wordpress-seo domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/joyplace/public_html/wp-includes/functions.php on line 6114

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /home4/joyplace/public_html/wp-includes/functions.php:6114) in /home4/joyplace/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1893
{"id":616,"date":"2015-12-19T06:54:59","date_gmt":"2015-12-19T12:54:59","guid":{"rendered":"https:\/\/www.bigdatainrealworld.com\/?p=616"},"modified":"2023-03-29T07:41:47","modified_gmt":"2023-03-29T12:41:47","slug":"beginners-apache-pig-tutorial-grouping-records","status":"publish","type":"post","link":"https:\/\/www.bigdatainrealworld.com\/beginners-apache-pig-tutorial-grouping-records\/","title":{"rendered":"Apache Pig Tutorial – Grouping Records"},"content":{"rendered":"

Apache Pig Tutorial – Grouping Records<\/h1>\n

Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don\u2019t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) \u201csee it in action\u201d video.<\/strong><\/p>\n

We have been learning a lot of concepts in Apache Pig (look at the previous posts). In this post we will see how to group records using Apache Pig.<\/p>\n

With all the concepts we have seen so far, why don’t we solve a meaningful problem. Our stocks dataset has day by day stocks information like opening, closing prices, high, low and volume for the day for several symbols across many years. Let’s calculate the average volume for all stocks symbol from year 2003.<\/p>\n

Load & Filter Records from 2003<\/h2>\n

Look up previous posts to know more about loading and projecting datasets.<\/p>\n

grunt> stocks = LOAD '\/user\/hirw\/input\/stocks' USING PigStorage(',') as (exchange:chararray, symbol:chararray, date:datetime, open:float, high:float, low:float, close:float,\u00a0volume:int, adj_close:float);\n\ngrunt> filter_by_yr = FILTER stocks by GetYear(date) == 2003;<\/pre>\n

Grouping Records<\/h2>\n

Now the dataset is loaded and filtered let’s group the records by symbol. Grouping records is very simple, use the GROUP…BY operator as follows<\/p>\n

grunt> grp_by_sym = GROUP filter_by_yr BY symbol;<\/pre>\n

Finding Average<\/h2>\n

Now the records are grouped by symbol we can do more meaningful operations on the grouped records. In order to project\u00a0grp_by_sym<\/span>\u00a0we need to know the structure of\u00a0\u00a0grp_by_sym<\/span>\u00a0. Use the DESCRIBE operator on\u00a0grp_by_sym<\/span><\/p>\n

grunt> DESCRIBE grp_by_sym;\ngrp_by_sym: {group: chararray,filter_by_yr: {(exchange: chararray,symbol: chararray,date: datetime,open: float,high: float,low: float,close: float,volume: int,adj_close: float)}}<\/pre>\n

From the describe operation we can see that grp_by_sym has two columns group <\/span>\u00a0and\u00a0filter_by_yr<\/span>\u00a0. group <\/span>\u00a0represents the column used for grouping, in our case we have grouped the records by symbol <\/span>\u00a0column so group <\/span>\u00a0represents symbol. filter_by_year <\/span>\u00a0will have the list of records for a given symbol.<\/p>\n

Our goal is to project the symbol and the average volume for the symbol. So the below FOREACH projection would achieve that with AVG function.<\/p>\n

grunt> avg_volume = FOREACH grp_by_sym GENERATE group, ROUND(AVG(filter_by_yr.volume)) as avgvolume;<\/pre>\n

<\/h2>\n

Display Result<\/h2>\n
grunt> DUMP avg_volume;<\/pre>\n

<\/h2>\n

See It In Action<\/h2>\n