Stream Data From Twitter To Analyze Using Hadoop - Big Data In Real World

Stream Data From Twitter To Analyze Using Hadoop

Adding Swap Memory To Amazon EC2
January 15, 2014
Troubleshooting Hive Action In Oozie
January 26, 2014
Adding Swap Memory To Amazon EC2
January 15, 2014
Troubleshooting Hive Action In Oozie
January 26, 2014

If you are learning Hadoop, you are probably tired of counting words and want to try some real analysis using real data. Amazon public dataset is a good source for data. But if you want to get your hands on with real time streaming data, Twitter is an excellent source. Twitter streaming allows you to stream data in JSON format which you can load it in to HDFS for analysis.

This post will go over the steps you need to stream data from Twitter. We will be just bringing the data using a standard Java program. What you will do with the data is totally up to you.

 

Create necessary tokens and keys from Twitter

Login to https://dev.twitter.com. You have to create an application and generate the following key/tokens. You can given any name to the application.

Consumer Key
Consumer Secret
Access Token
Access Token Secret

This document from Twitter explain the steps to get the keys/tokens.

 

Necessary Jars

Twitter4J (http://twitter4j.org)  is the “unofficial” library for Twitter. Download the latest version of Twitter4J from the website.

Add the following 2 jars to your buildpath.

twitter4j-core-3.0.5.jar
twitter4j-stream-3.0.5.jar

 

Twitter Hello World

Create a Java project in Eclipse and add the above to jars to your buildpath.

The program is very much self explanatory. We also have a separate .properties file (twitter.properties) to hold all the keys/tokens needed for the program to run. In this program we are going to listen for tweets with some predefined “keywords”. Change the keywords you would like to listen to in the TWITTER_KEYWORDS property in twitter.properties file. We will be saving the output to a file.

Here is the Java program

package com.jerry.hadoop.twitter;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Properties;

import twitter4j.FilterQuery;
import twitter4j.StallWarning;
import twitter4j.Status;
import twitter4j.StatusDeletionNotice;
import twitter4j.StatusListener;
import twitter4j.TwitterStream;
import twitter4j.TwitterStreamFactory;
import twitter4j.conf.ConfigurationBuilder;
import twitter4j.json.DataObjectFactory;

public class TwitterHelloWorld {

	/** The actual Twitter stream. It's set up to collect raw JSON data */
	private TwitterStream twitterStream;
	private String[] keywords;
	Properties prop = new Properties();
	FileOutputStream fos;

	public TwitterHelloWorld() {

		//load a properties file
		try {
			prop.load(new FileInputStream("twitter.properties"));

			ConfigurationBuilder cb = new ConfigurationBuilder();
			cb.setOAuthConsumerKey(prop.getProperty("CONSUMER_KEY"));
			cb.setOAuthConsumerSecret(prop.getProperty("CONSUMER_SECRET"));
			cb.setOAuthAccessToken(prop.getProperty("ACCESS_TOKEN"));
			cb.setOAuthAccessTokenSecret(prop.getProperty("ACCESS_TOKEN_SECRET"));
			cb.setJSONStoreEnabled(true);
			cb.setIncludeEntitiesEnabled(true);

			twitterStream = new TwitterStreamFactory(cb.build()).getInstance();

		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

	}

	public void startTwitter() {

		try {
			fos = new FileOutputStream(new File("c:\\hadoop\\twitterstream.json"));
		} catch (IOException e) {
			e.printStackTrace();
		}

		String keywordString = prop.getProperty("TWITTER_KEYWORDS");
		keywords = keywordString.split(",");
		for (int i = 0; i < keywords.length; i++) {
			keywords[i] = keywords[i].trim();
		}

		// Set up the stream's listener (defined above),
		twitterStream.addListener(listener);

		System.out.println("Starting down Twitter sample stream...");

		// Set up a filter to pull out industry-relevant tweets
		FilterQuery query = new FilterQuery().track(keywords);
		twitterStream.filter(query);

	}

	public void stopTwitter() {

		System.out.println("Shutting down Twitter sample stream...");
		twitterStream.shutdown();

		try {
			fos.close();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	StatusListener listener = new StatusListener() {

		// The onStatus method is executed every time a new tweet comes in.
		public void onStatus(Status status) {
			// The EventBuilder is used to build an event using the headers and
			// the raw JSON of a tweet
			System.out.println(status.getUser().getScreenName() + ": " + status.getText());

			System.out.println("timestamp : "+ String.valueOf(status.getCreatedAt().getTime()));
			try {
				fos.write(DataObjectFactory.getRawJSON(status).getBytes());
			} catch (IOException e) {
				e.printStackTrace();
			}

		}

		// This listener will ignore everything except for new tweets
		public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {}
		public void onTrackLimitationNotice(int numberOfLimitedStatuses) {}
		public void onScrubGeo(long userId, long upToStatusId) {}
		public void onException(Exception ex) {}
		public void onStallWarning(StallWarning warning) {}
	};

	public static void main(String[] args) throws InterruptedException {

		TwitterHelloWorld twitter = new TwitterHelloWorld();
		twitter.startTwitter();
		Thread.sleep(20000);
		twitter.stopTwitter();

	}

}

 

Here is the twitter.properties sample. Substitute your keys/tokens. Use the TWITTER_KEYWORDS to listen to tweets with specific keywords in comma separated text.

CONSUMER_KEY=zy000000000000000000
CONSUMER_SECRET=fSgk00000000000000000000000000000000000000
ACCESS_TOKEN=22800000000000000000000000
ACCESS_TOKEN_SECRET=pt00000000000000000000000000
TWITTER_KEYWORDS=hadoop, big data

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

210 Comments

  1. medication without a doctors prescription https://kertvbs.webgarden.com/

    Thank you. I like it!

  2. canadian rx world pharmacy https://kevasw.webgarden.com/

    Seriously lots of wonderful advice.

  3. canadian government approved pharmacies https://site128620615.fo.team/

    Really lots of valuable data.

  4. buy cialis without a doctor’s prescription https://sehytv.wordpress.com/

    You have made your stand quite nicely.!

  5. best canadian pharmacy https://kertubs.mystrikingly.com/

    Thanks a lot! Useful stuff.

  6. tadalafil without a doctor’s prescription https://deiun.flazio.com/

    Cheers! Loads of facts.

  7. canadian pharmacy meds https://kertyun.flazio.com/

    You expressed it perfectly.

  8. best canadian mail order pharmacies https://gewrt.usluga.me/

    Useful information. Many thanks!

  9. canada pharmacy https://canadian-pharmacy.webflow.io/

    You made your stand extremely effectively..

  10. canadian online pharmacies https://site656670376.fo.team/

    You explained it really well.

  11. cialis 20mg prix en pharmacie https://site561571227.fo.team/

    You said it nicely..

  12. canadian discount pharmacies https://kwersd.mystrikingly.com/

    Truly loads of terrific data!

  13. cialis prices https://dkyubn.bizwebs.com/

    Very well expressed genuinely. !

  14. northwest pharmacy canada https://kaswesa.nethouse.ru/

    Reliable facts. Kudos!

  15. medication without a doctors prescription http://pharmacy.prodact.site/

    You have made your stand pretty clearly!!

  16. tadalafil without a doctor’s prescription https://hertb.mystrikingly.com/

    Superb material. Regards.

  17. purchasing cialis on the internet https://site955305180.fo.team/

    Factor certainly taken!.

  18. medication without a doctors prescription http://lwerfa.iwopop.com/

    Beneficial data. Cheers.

  19. cialis 20 mg http://herbsd.iwopop.com/

    You actually expressed that effectively!

  20. no 1 canadian pharcharmy online http://kawerf.iwopop.com/

    You reported this adequately!

  21. cialis 20 mg best price http://cialis.iwopop.com/

    Amazing quite a lot of wonderful material!

  22. best canadian mail order pharmacies https://www.divephotoguide.com/user/Pharmacy

    You actually suggested that wonderfully.

  23. Canadian Pharmacies Shipping to USA https://canadianpharmacy.teachable.com/

    This is nicely expressed! .

  24. canada pharmacies online prescriptions https://www.artstation.com/etnyqs6/profile

    Nicely put, Thank you.

  25. buy generic cialis https://kvqtig.zombeek.cz/

    You actually expressed this wonderfully.

  26. cialis purchase online without prescription http://kwsedc.iwopop.com/

    Many thanks, I value this!

  27. online prescriptions without a doctor http://kwerks.iwopop.com/

    Nicely voiced genuinely! .

  28. canadian pharmacy cialis https://selaw.flazio.com/

    Fantastic write ups. Thanks a lot!

  29. best canadian pharmacy https://hkwerf.micro.blog/

    Nicely put. Many thanks!

  30. cialis without a doctor’s prescription https://kawerc.proweb.cz/

    Wow a lot of terrific knowledge.

  31. cialis uk https://lawert.micro.blog/

    You actually explained it fantastically.

  32. cialis 20mg prix en pharmacie https://linktr.ee/canadianpharmacy

    With thanks! Numerous posts!

  33. cialis without a doctor’s prescription https://buycialisonline.fo.team/

    Whoa a lot of excellent advice!

  34. cialis 20 mg best price http://lasweb.iwopop.com/

    You actually mentioned it superbly.

  35. cialis says:

    online prescriptions without a doctor https://www.divephotoguide.com/user/buycialisonline

    Many thanks. I like it!

  36. Generic cialis tadalafil https://kwenzx.nethouse.ru/

    Wonderful write ups. Thanks.

  37. canadian drugstore https://dwerks.nethouse.ru/

    You explained this exceptionally well!

Leave a Reply

Stream Data From Twitter To Analyze Using Hadoop
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X