OUR GUARANTEE
Demystify Spark
Spark is more of a mystery even for the ones who are working in Spark. This is because most don't understand how Spark works, how it achieves the efficiency with job execution, how Spark interact with other sources. This is exactly what scares beginners in Spark to get in to Spark as well. Don't worry. We have got your back. We will untie the tangled parts of Spark and demystify Spark in an easy and simple way in which you could understand. You are in good hands with us.
Look under the hood
We are well known for helping our students see what is under the hood behind a technology. The reason we do this is because when you understand what happens behind the scenes, you can be better prepared for the problems that the technology or tool may throw at you and even better; you can design better and efficient solutions knowing how things work internally. For eg. our explanation of how Shuffle works in Spark is something you will never find elsewhere.
Streaming & Machine Learning
We go beyond RDD, DataFrame and Dataset. Spark is much more than RDDs and Spak SQL. Spark is a data analytics platform. So we go beyond the RDDs and talk about all the important and interesting modules in Spark like Spark streaming and Machine Learning.
Become Real World Ready
Our number one goal in all our courses is to make you production and real world ready. We have done just that in all our courses and Spark Developer In Real World is no exception. We talk about internals, troubleshooting, optimizations, issues you might expect in production. We have designed this course to make sure it gives you the confidence you need to get the dream job you wanted and succeed from day one once you land on the job.
Spark and more..
Spark is an interesting tool but real world problems and use cases are solved not just with Spark. Spark is usually used in conjunction with other tools in the big data ecosystem. So to give you a taste of how real world looks like, we have included projects that include Spark along with other tools in the ecosystem like Kafka, HBase and Elasticsearch.
Interesting Projects
We love practical over theory and we include lot of interesting projects and use interesting datasets to demonstrate the concepts. This course is no exception to our principle. We use dataset from Stackoverflow along with Elasticsearch, predict results with 2016 US presidential election data, machine learning with Yelp dataset just to name a few.
Practice in our cluster for Free
Practicing Hadoop or Spark with a packaged sandbox VM in your Laptop is like learning to play a guitar with out a guitar. To learn Hadoop right, you need access to a multi-node environment. You will get free access to our multi node cluster along with this course.
30 Day Money Back Guarantee
Don't like the course for any reason. No Worries. Let us know with in 30 days and we will do a 100% refund. No questions asked.
Excellent & Caring Support
Our students satisfaction is of utmost importance and every thing else is secondary. We are here for you, every step of the way and you can count on us.
TECHNICAL HIGHLIGHTS
TOPICS
- Shuffle in Depth
- Your code to Spark tasks
- Spark with other Sources and Formats
- Catalyst Optimizer and Tungsten
- Resource Management
- Cluster Setup
- Optimizations & Troubleshooting Tips
- Spark Streaming
- Spark Machine Learning
- Spark with Kafka, Elasticsearch, HBase
PROJECTS
- Page Ranking pages from Wikipedia DataFrames | RDD
- Analyzing Trending YouTube videos (CSV & JSON) | Datasources & Formats
- Steaming with activity data from IoT devices | Spark Streaming
- Streaming data from Meetup.com with Kafka | Spark Streaming
- Predicting Country’s Happiness Rank from Happiness Score | Machine Learning
- Predicting 2016 US Elections | Machine Learning
- Predicting Yelp Rating (+ve / -ve) | Machine Learning
- Build mini site with Stackoverfow data with Elasticsearch | End to End Project
FAQ
1Is this course right for me?
This course is great for someone who is trying to either launch a career in Big Data or already working in Hadoop or other related tools and would like to move in to Spark. We have designed the course in a way it will give you confidence to attend interviews and give you the skills to work in real world production environment from day one.
2What skills do I need to start with the course?
Basic Linux knowledge. Simple commands to change directories, open/close files etc. Basic SQL knowledge. Simple selects, inserts & simple join statements. Basic Java or Python knowledge won't hurt because we write and walk over the programs in the projects that are covered in the course. But don't get intimated if you are not a programmer. We totally undestand that some students are not programmers and we will walk over all the code step by step so it will be super easy to follow. You will be in good hands. We make sure you are not lost.
3I am looking to learn a specific tool. How do I know whether that tool is covered?
We have detailed up-to-date curriculum explaining every topic that is covered in the course. Please check the curriculum below to find out whether the tool you are looking for is in the curriculum.
4I am still not sure whether this course is good for me..
No worries. We totally understand. Let's us know your expectations by emailing us - info@hadoopinrealworld.com and we will give our HONEST opinion whether this course will be a good fit for you or not.
5What if I have questions while I take the course?
You can ask us questions anytime by posting your questions or comments below the video in each lesson and we will answer promptly.
6Do I get access to a Spark cluster?
Yes. You get access to a 3 node Spark cluster for free hosted in AWS.
7I don't see a topic. Will it be added?
Big Data ecosystem is evolving fast. We update our courses frequently. So all our courses are living courses. You can check out our release schedule @ https://www.bigdatainrealworld.com/upcoming-releases/
8Do I get lifetime access?
Yes. Absolutely. You get lifetime access to the course, all the future updates to the course and lifetime access to the cluster.
CURRICULUM
Chapter 1: Let's Get Started
- Thank you and Welcome | 11:35
- Tools and Setup | 8:30
Chapter 2: Introduction To Spark
- Hadoop vs. Spark - Who Wins | 15:30
- Challenges Spark Tries To Address | 12:24
- How Spark Is Faster Than Hadoop | 8:39
Chapter 3: RDD - Core Of Spark
- The Need For RDD | 11:29
- What Is RDD | 12:30
- What An RDD Is Not | 7:31
Chapter 4: Execution In Spark (Behind the scenes)
- First Program In Spark | 16:04
- What are Dependencies and Why They are Important | 11:11
- Program to Execution | Part 1 | 13:01
- Program to Execution | Part 2 | 19:10
- Caching Data In Spark | 15:04
- Fault Tolerance | 7:34
Chapter 5: Shuffle in Spark
- Need for Shuffle | 10:45
- Hash Shuffle Manager - Part 1 | 11:44
- Hash Shuffle Manager - Part 2 | 14:29
- Sort Shuffle Manager | 8:15
Chapter 6: Spark Transformations
- reduceByKey vs groupByKey | 9:34
- Cogroup, Join and Avoiding Shuffle - Part 1 | 14:19
- Cogroup, Join and Avoiding Shuffle - Part 2 | 8:23
- Resizing Partitions | 7:46
Chapter 7: PageRanking with RDDs
- PageRanking Algorithm
- PageRank Walk-through
- Implementing PageRank with RDDs
- PageRank Walk-through
Chapter 8: Beyond RDDs
- What's the Problem with RDDs | 11:53
- DataFrame vs DataSet vs SQL | 12:25
- Simple Selects | 8:26
- Filtering DataFrames | 2:24
- Aggregating DataFrames | 5:19
- Joining DataFrames | 8:20
- PageRanking with DataFrames | 16:39
Chapter 9: Spark with Other Datasources & File Formats
- Spark & Hive | 8:26
- Spark & Hive with XML, Parquet & ORC | 14:23
- Spark & RDBMS | 8:49
- Spark & HBase | Part - 1 | 18:47
- Spark & HBase | Part - 2 | 9:03
Chapter 10: Spark Optimizations
- Number of Tasks
- Join Algorithms
- Picking a Join Algorithm
- Join Hints
Chapter 11: Spark - Under the Hood
- Inside the Catalyst Optimizer | 12:05
- Catalyst Optimizer - Plan Walkthrough | 6:27
- Project Tungsten - Better Memory Management | 13:09
- Project Tungsten - CPU Cache Aware Optimizations | 11:05
Chapter 12: Resource Management
- Spark Architecture
- Memory Layout In Executor
- Resource Management - Standalone
- Resource Management - YARN
- Dynamic Resource Allocation | 7:47
- Memory Layout In Executor
Chapter 13: Cluster Installation
- Spark Installation | 5:28
- Hadoop Cluster Setup | Part 1 | 23:43
- Hadoop Cluster Setup | Part 2 | 25:35
- Hadoop Cluster Setup | Part 3 | 18:01
Chapter 14: An end to end project (Spark, Elasticsearch, Kibana, REST and Angular)
- Start End to End Project Introduction | 8:09
- Start Elasticsearch (A quick introduction) | 8:18
- Start Hands-on with Elasticsearch | 10:45
- Start Stackoverflow Dataset | 8:58
- Start Spark ETL | 12:53
- Start Visualizations with Kibana | 8:44
- Start REST Service with Spring framework | 19:29
- Start Building an Angular application | 12:28
Chapter 15: Introduction to Kafka
- Kafka - The Why and the What | 8:43
- Key Concepts | 12:32
- Experiments with Kafka | 19:18
Chapter 16: Machine Learning
- Introduction to Machine Learning | 11:38
- Machine Learning Blueprint | 5:49
- Feature Engineering | 10:39
- Linear Regression | 8:17
- World Happiness Project
- Decision Trees | 9:55
- Random Forest | 3:14
- Predicting 2016 US Elections | 11:46
- Predicting Yelp Ratings | +ve or -ve
| 15:55
Chapter 17: Streaming with Spark
- Why Streaming and How Spark Does Streaming | 11:51
- Core Concepts in Streaming | 8:36
- Output Modes With Non Aggregate Queries | 13:40
- Output Modes With Aggregate Queries | 8:50
- Event Time, Window and Late Events | 10:39
- Handling Late Events In Streaming | 10:47
- Late Events and Append Mode | 8:05
- Streaming Meetup with Spark | Part 1 | 5:31
- Streaming Meetup with Spark | Part 2 | 8:53
Chapter 18: A Short Chapter On Scala
- Introduction to Scala | 12:05
- First Program in Scala | not HelloWorld | 11:45
- Scala Functions | 11:43