Stream Processing vs. Message Processing: What’s the Difference?
July 17, 2023What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism in Spark?
July 31, 2023This post aims at describing the differences between client and cluster deploy modes in Spark.
Client mode
- The driver is launched directly within the spark-submit process which acts as a client to the cluster.
- Driver does not use any cluster resources if the driver is launched from a node outside the cluster.
- Application can not be tracked if the node that kicked off the driver has connectivity issues to the Spark cluster.
- Not ideal if the node that kicks off the driver is not on the same network as the cluster or the bandwidth between the driver node and cluster is not optimal.
- Client mode is better if the driver is mostly idle and not resource intensive.
- Best for interactive applications (for eg. Spark REPL)
- Node that submit the job in client mode should stay up and healthy for the lifetime of the application.
- With cluster mode in YARN, driver process runs inside the Application Master.
Cluster mode
- Driver runs on one of the cluster’s Worker nodes.
- Driver runs as a dedicated, standalone process on one of the worker nodes in the cluster and hence it uses the cluster resources.
- Since both the driver and worker tasks are running on the cluster we don’t have to worry about connectivity failures with the driver and the cluster.
- Cluster mode is best for busy and resource intensive drivers
- Doesn’t work for interactive applications (for eg. Spark REPL)
- Node that submit the job in cluster mode need not stay live once the application is kicked off.
- With cluster mode in YARN, driver process runs inside the Application Master.