What is the difference between client and cluster deploy modes in Spark? - Big Data In Real World

What is the difference between client and cluster deploy modes in Spark?

Stream Processing vs. Message Processing: What’s the Difference?
July 17, 2023
What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism in Spark?
July 31, 2023
Stream Processing vs. Message Processing: What’s the Difference?
July 17, 2023
What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism in Spark?
July 31, 2023

This post aims at describing the differences between client and cluster deploy modes in Spark.

Client mode

  • The driver is launched directly within the spark-submit process which acts as a client to the cluster.
  • Driver does not use any cluster resources if the driver is launched from a node outside the cluster.
  • Application can not be tracked if the node that kicked off the driver has connectivity issues to the Spark cluster.
  • Not ideal if the node that kicks off the driver is not on the same network as the cluster or the bandwidth between the driver node and cluster is not optimal.
  • Client mode is better if the driver is mostly idle and not resource intensive. 
  • Best for interactive applications (for eg. Spark REPL)
  • Node that submit the job in client mode should stay up and healthy for the lifetime of the application.
  • With cluster mode in YARN, driver process runs inside the Application Master.

Cluster mode

  • Driver runs on one of the cluster’s Worker nodes. 
  • Driver runs as a dedicated, standalone process on one of the worker nodes in the cluster and hence it uses the cluster resources.
  • Since both the driver and worker tasks are running on the cluster we don’t have to worry about connectivity failures with the driver and the cluster.
  • Cluster mode is best for busy and resource intensive drivers
  • Doesn’t work for interactive applications (for eg. Spark REPL)
  • Node that submit the job in cluster mode need not stay live once the application is kicked off.
  • With cluster mode in YARN, driver process  runs inside the Application Master.
Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is the difference between client and cluster deploy modes in Spark?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X