Finding the MAX tuple with Pig - Big Data In Real World

Finding the MAX tuple with Pig

How to find directories in HDFS which are older than N days?
January 30, 2017
HDFS – Why another file system?
February 6, 2017
How to find directories in HDFS which are older than N days?
January 30, 2017
HDFS – Why another file system?
February 6, 2017

Finding the MAX tuple with Pig

Here is a sample dataset. Our goal is to find the record with maximum record_value which is DEF, 300 

record_key, record_value
ABC,100
DEF,300
GHI,40
XYZ,150

Script

Here is a very short and simple script to do find the max value.

dataset = LOAD ‘max-records-test.txt' USING PigStorage(',') AS (record_key: chararray, record_value: long);
A = GROUP dataset ALL;
B = FOREACH A GENERATE MAX(dataset.record_value) AS max_val;
C = FILTER dataset BY record_value == (long)B.max_val;
DUMP C;

Explanation

GROUP dataset ALL  groups all records in the dataset in to one tuple or one row. The output of the group operation will look like below. Since we are grouping all records, the result is just one record or tuple. So you have a tuple with 2 columns. The 2nd column is an interesting column it is a nested column which is bag of tuples.

(all, {(ABC,100),(DEF,300),(GHI,40),(XYZ,150)})

FOREACH A GENERATE MAX(dataset.value) will get the max record_value from the set of tuples – (ABC,100),(DEF,300),(GHI,40),(XYZ,150).

B will be assigned 300.

Next instruction filters the record from the dataset with record_value 300. Finally print the record with maximum record_value.

C = FILTER dataset BY record_value == (long)B.max_val;
DUMP C;

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Finding the MAX tuple with Pig
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X