Apache Pig Tutorial – Loading Datasets
December 7, 2015Apache Pig Tutorial – Project and Manipulate Columns
December 16, 2015Apache Pig Tutorial – Load Variations
Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.
In this post, we looked at how to load and display dataset using Apache Pig. In this post we will see different LOAD variations in Pig.
Variation 1 – Load Without Column Names or Types
grunt> stocks1 = LOAD '/user/hirw/input/stocks' USING PigStorage(',');
Variation 2 – Load With Column Names but No Types
grunt> stocks2 = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as (exchange, symbol, date, open, high, low, close, volume, adj_close);
Variation 3 – Load With Column Names and Types
grunt> stocks3 = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as (exchange:chararray, symbol:chararray, date:datetime, open:float, high:float, low:float, close:float, volume:int, adj_close:float);
The structure of stocks3 (Variation 3) is well defined. But what is the structure of stocks1 and stocks? To look up the structure of a relation (for eg. stocks1) use the DESCRIBE operator.
Describe Operator
Pig can not guess the structure of stocks1 as we did not provide either column names or types.
grunt> DESCRIBE stocks1; Schema for stocks1 unknown.
With stocks2, Pig know the column names and makes all the column types to be the default bytearray .
grunt> DESCRIBE stocks2; stocks2: {exchange: bytearray,symbol: bytearray,date: bytearray,open: bytearray,high: bytearray,low: bytearray,close: bytearray,volume: bytearray,adj_close: bytearray}
Even with an incomplete definition of datasets Pig will be able to work with the dataset. We will see that in the next post.
See It In Action
Previous Lesson : Loading Datasets
2 Comments
[…] Next Apache Pig Tutorial – Loading […]
[…] Previous Apache Pig Tutorial – Project and Manipulate Columns […]