Try Hadoop In 5 Minutes - Big Data In Real World

Try Hadoop In 5 Minutes

Calculate Resource Allocation for Spark Applications
July 13, 2020
How to show full column content in a Spark DataFrame?
December 7, 2020
Calculate Resource Allocation for Spark Applications
July 13, 2020
How to show full column content in a Spark DataFrame?
December 7, 2020

Let’s do this!

We are going to do 3 things and trust us, it is going to take less than 5 minutes.

1. Login to our Hadoop cluster on AWS cloud (for free)
2. Try HDFS
3. Run a MapReduce job

Login to our Hadoop cluster on AWS cloud (for free)

You would need keys to login to our cluster. Sign up and get you the keys if you don’t have the keys yet. Follow the instructions below once you have the keys.

Host IP – 54.85.143.224
User Name – hirwuser150430

Click here if you don’t have the private key to connect to the cluster

Follow below steps if you are using Windows

Download and install putty

Instructions to connect from Windows

Follow below steps if you are using Linux

Switch to the directory where you have downloaded your keys.

chmod 600 hirwuser150430.pem 

ssh -i hirwuser150430.pem  hirwuser150430@54.85.143.22

Instructions to connect from Linux

 

Try HDFS

Now to the fun part.

  • Create a directory
  • Upload a file to the directory
  • Delete the directory

Create a directory

Create a directory named “my-first-hdfs-directory”. Change the directory name to your liking.

hadoop fs -mkdir my-first-hdfs-directory

Upload a file to the directory

You are now uploading a file named sample-file.csv to a directory in HDFS which you created in the previous step.

hadoop fs -copyFromLocal /hirw-starterkit/hdfs/sample-file.csv my-first-hdfs-directory

Delete the directory

Let’s now delete the directory which we created.

hadoop fs -rm -r my-first-hdfs-directory

Alright, now that we have tried HDFS, let’s now try to run a MapReduce job.

Run a MapReduce job

You are now going to calculate maximum close price of each stock from the stocks dataset. The stock dataset is under /user/hirw/input/stocks  and output will be saved under output/mapreduce/stocks  in HDFS.

Delete output directory

hadoop fs -rm -r output/mapreduce/stocks

Run MapReduce job

hadoop jar /hirw-starterkit/mapreduce/stocks/MaxClosePrice-1.0.jar com.hirw.maxcloseprice.MaxClosePrice /user/hirw/input/stocks output/mapreduce/stocks

View output

hadoop fs -cat output/mapreduce/stocks/part-r-00000

Congratulations! You have now tried both HDFS and MapReduce. Want to learn more? Enroll in our free Hadoop Starter Kit course. It is 100% free and we promise you will learn more about Hadoop in this free course than from paid course offered by others.

Interested in chatting with us to learn about our course offering? We are just one click away.

Please note: we work in Chicago hours and we don’t outsource support. If you find us not responding, you most likely caught us in our night time. Please leave us your email in the chat and we will get back to you as soon as we see your message.

Click to chat with us

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.
Try Hadoop In 5 Minutes
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X