Apache Pig Tutorial – Ordering Records
December 20, 2015Apache Pig Tutorial – Executing Script with Parameters
December 20, 2015Apache Pig Tutorial – Executing as a Script
Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.
So far in a series of lessons we saw step by step how to calculate average volume for stocks and along the way we learnt several key operators in Apache Pig. In this lesson we will see how to run pig instructions as a script.
DUMP vs. STORE
DUMP operator is used to display or print data on the screen but more often than not we would like to store the results in HDFS. STORE operator is used to store the results in HDFS.
With store we can also specify what delimiter to use when we store the results. In the below example we are instructions Pig to store the records from top10 relation in to output/pig/avg-volume in HDFS and the column delimiter will be specified using the PigStorage function. In this case the columns will be delimited by comma.
grunt> top10 = LIMIT avg_vol_ordered 10; grunt> STORE top10 INTO 'output/pig/avg-volume' USING PigStorage(',');
Running Instructions as a Script
Running a series of pig instructions is very simple. Simply save the instructions in a file. The file extension – .pig is not mandatory but more of a convention. Execute the file like below
pig /hirw-workshop/pig/scripts/average-volume.pig
See It In Action
Previous Lesson : Ordering Records
Next Lesson : Executing Script with Parameters
2 Comments
[…] iframe { visibility: hidden; opacity: 0; } Previous Apache Pig Tutorial – Executing Script with […]
[…] Next Lesson : Execute as a Script […]