What does “Stage Skipped” mean in Apache Spark web UI?

How to export a Hive table into a CSV file?

August 6, 2021

How to download an entire bucket from S3?

August 11, 2021

Published by Big Data In Real World at August 9, 2021

Cached data

If the data is cached or persisted by an explicit use of cache() or persist() you might see a stage being skipped when the result of the stage is already cached.

Shuffle data

Spark will automatically cache the data in the stage right after the shuffle. Shuffle is an expensive operation and hence Spark does this automatically. But note, the data will not be available for ever. This data will be evicted using Least Recently Used (LRU) strategy as soon as memory becomes unavailable for newer data.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What does “Stage Skipped” mean in Apache Spark web UI?

How to export a Hive table into a CSV file?

How to download an entire bucket from S3?

How to export a Hive table into a CSV file?

How to download an entire bucket from S3?

Cached data

Shuffle data

Big Data In Real World

Related posts

How to kill a running Spark application?

What is the default number of executors in Spark?

What is the default number of cores and amount of memory allocated to an application in Spark?