What is ZooKeeper and it’s Use Case

Q&A Session – Spark, Flink, Cluster Sizing and more

October 18, 2017

Troubleshooting Memory Issues with MapReduce Jobs

November 1, 2017

Published by Big Data In Real World at October 25, 2017

ZooKeeper & YARN HA

In this post we will see, how ZooKeeper is used to achieve high availability in YARN? Resource Manager in YARN is a single point to failure – meaning if we have Resource Manager running on only one node and when that node fails, we have a problem and we won’t be able to run any jobs in our Hadoop cluster. To avoid Resource Manager to be a single point of failure we need to enable High Availability with Resource Manager. This means we will run another Resource Manager instance on another node in stand by mode. When the active Resource Manager fails with high availability configuration in place the stand by resource manager will take the role of the active resource manager. Now, when the active resource manager fails, we can manually failover the resource manager so the standby resource manager will become active. This means there is manual intervention involved from the Hadoop administrator to initiate the failover. In a busy and highly active production cluster, manual failover is not a great option because you will lose valuable time in contacting the administrator and initiating the failover.

The other option we have is automatic failover. We can configure ZooKeeper to do an automatic failover of the resource manager when the active resource manager fails. How does ZooKeeper help or facilitate the automatic failover? There are 2 things we want to understand here.

1. How ZooKeeper is used to elect the active resource manager when there are more than one resource managers? This is referred to as leader election.
2. How ZooKeeper handles automatic failover? That is, how does a stand by resource manager becomes active in an event the active resource manager fail.

If you have read so far, we promise you will love to hear what other things we have to say.

Leader Election

Leader election is one of the common use case for ZooKeeper. Let’s see how it works.

As an application using ZooKeeper you can create what is called a znode in ZooKeeper. Znodes in ZooKeeper looks like a file system structure with folders and files. For example in this illustration. tom is a znode and it has two znodes under it – sam and emily, emily has two more znodes – john and riley. We can also embed data in each znode if we like. You can not embed a lot of data. You can embed data less than 1 MB.

Let’s say we have 2 resource managers. Each resource manager will participate in the leader election process to decide who becomes the active resource manager. Resource managers use ZooKeeper to elect a leader among themselves. Let’s see how it works. Both resource mangers will attempt to create a znode named ActiveStandbyElectorLock in ZooKeeper and only one resource manager will be able to create ActiveStandbyElectorLock znode.

Whoever creates this znode first will become the active resource manager. So if resource manger 1 creates ActiveStandbyElectorLock first, it will become the active node and if resource manger 2 creates ActiveStandbyElectorLock, it will be become the active node.

Let’s say resource manger 1 created the ActiveStandbyElectorLock znode first, resource manager 2 will also attempt to create ActiveStandbyElectorLock but it won’t be able to because ActiveStandbyElectorLock is already created by resource manager 1 so resource manager 1 becomes the active resource manager and resource manager 2 becomes the stand by node and goes in to stand by mode. So far clear?

So we have now elected a leader. Resource manager 1 is the active resource manager now.

Ephemeral Node

There is an interesting thing about ActiveStandbyElectorLock znode, it is a special kind of znode. It is called an ephemeral node. Meaning if the process creating an ephemeral node loses its sesson with ZooKeeper, the znode will be deleted. Meaning when resource manager 1 goes down or losses the sesson with Zookeper ActiveStandbyElectorLock will be deleted.

Persistent Node

Resource manager 1 will also create another znode named znode named ActiveBreadCrumb. ActiveBreadCrumb znode is not an ephemeral node, it is another type of znode called persistent node meaning when resource manager 1 goes down or losses the session with ZooKeper, unlike ActiveStandbyElectorLock znode, ActiveBreadCrumb, which is a persistent znode, does not gets deleted.

Resource manager 1 will write information about itself, the host name and port of resource manager 1 in to the ActiveBreadCrumb. This way anyone can find out who is the active resource manager at any given time.

Like what you are reading? You will find our Hadoop Administrator In Real World course very interesting. See how you can install and configure ZooKeeper and Resource Manager High Availability in Hadoop Administrator In Real World course.

Failover

With Resource Manager 1 as the active node, everything is working good, up until one fine morning when the RAM in resource manager 1 has gone bad. So resource manager 1 went down. Now our expectation is resource manager 2 should become the active node automatically. Let’s now understand how failover from active node to standby node works.

With ZooKeeper, we can add a watch to any node, watchers are extremely helpful because if we are watching a znode and when that znode gets updated or deleted, whoever is watching the znode will get notified.

Remember, whoever creates the ActiveStandbyElectorLock ephemeral znode first becomes the active resoure manager. In our case resource manager 1 beat resource manger 2 by creating ActiveStandbyElectorLock first. When resource manager 2 tried to create ActiveStandbyElectorLock znode, it couldn’t because it was already created. So resource manager 2 realized that there is already an active resource manager elected and it became the stand by node. Before resource manager 2 becomes stand by, it will add a watch on the ActiveStandbyElectorLock znode.

Automatic failover in Zookeeper

When resource manager 1 goes down due to the RAM failure, it’s session with ZooKeeper will become dead and since ActiveStandbyElectorLock is an ephemeral node, ZooKeeper will then delete the ActiveStandbyElectorLock node as soon the application who created the znode loses connection with ZooKeeper. Since resource manger 2 is watching the ActiveStandbyElectorLock node, it will get a notification stating ActiveStandbyElectorLock is gone.

Resource Manager 2 will now attempt to create ActiveStandbyElectorLock node and since it is the only one trying to create the node, it will succeed in creating the ActiveStandbyElectorLock znode and will become the active resource manager.

Fencing

A RAM failure is not recoverable but assume resource manager 1 experienced a connectivity issue with ZooKeeper for a brief moment. This will cause ZooKeeper to delete the ActiveStandbyElectorLock ephemeral znode, because for ZooKeeper it lost the session with resource manger 1 – it doesn’t matter few seconds or minutes. As soon the session is lost, ZooKeeper will delete the ActiveStandbyElectorLock ephemeral znode.

But in this case resource manager 1 is still active at least it thinks it is active. By this time resource manger 2 becomes active. We don’t want 2 resource managers thinking or acting as active at the same time, this will cause lot of issues for the applications running in the cluster and it is referred to as split brain scenario.

So before resource manager 2 becomes active it will read ActiveBreadCrumb znode to find who was the active node, in our case it will be resource manager 1 and will attempt to kill the resource manager 1 process, this process is referred to as fencing. This way we are certain that there is only one active resource manager at time. After the fencing attempt, resource manager 2 will write it’s information to the ActiveBreadCrumb znode since it is now the active resource manager.

This is how ZooKeeper is used in leader election and failover process with YARN or resource manager high availability. This same concept is applied to achieve NameNode high availability as well. So when you have multiple process or application trying to perform the same action but you need coordination between the applications so they don’t step on each other “toes” you can use ZooKeeper as a coordinator for the applications.

If you have read so far, we promise you will love to hear what other things we have to say.

Big Data In Real World

We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

What is ZooKeeper and it’s Use Case

Q&A Session – Spark, Flink, Cluster Sizing and more

Troubleshooting Memory Issues with MapReduce Jobs

Q&A Session – Spark, Flink, Cluster Sizing and more

Troubleshooting Memory Issues with MapReduce Jobs

ZooKeeper & YARN HA

Leader Election

Ephemeral Node

Persistent Node

Failover

Fencing

Big Data In Real World

Related posts

Sunset: Hadoop Developer In Real World cluster

How to recursively delete files, folders or bucket from S3?

Hadoop In Real World is now Big Data In Real World!