How to get the current date and time in Hive?
September 27, 2021How to replace characters in Hive?
October 1, 2021We get a lot of questions on the differences in Spark applications, jobs, stages and tasks. Also we see there is a lot of misunderstanding about these topics with new learners and experienced Spark developers alike. So our goal with this post is to give you a crisp pointer on each of these concepts in Spark.
Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>
Task
Task is the smallest execution unit in Spark. A task in spark executes a series of instructions. For eg. reading data, filtering and applying map() on data can be combined into a task. Tasks are executed inside an executor.
Stage
A stage comprises several tasks and every task in the stage executes the same set of instructions.
Job
A job comprises several stages. When Spark encounters a function that requires a shuffle it creates a new stage. Transformation functions like reduceByKey(), Join() etc will trigger a shuffle and will result in a new stage. Spark will also create a stage when you are reading a dataset.
Application
An application comprises several jobs. A job is created, whenever you execute an action function like write().
Summary
A Spark application can have many jobs. A job can have many stages. A stage can have many tasks. A task executes a series of instructions.