How does a consumer know the offset to read after restart in Kafka? - Big Data In Real World

How does a consumer know the offset to read after restart in Kafka?

What is the default number of executors in Spark?
September 25, 2023
How to kill a running Spark application?
October 9, 2023
What is the default number of executors in Spark?
September 25, 2023
How to kill a running Spark application?
October 9, 2023

Let’s say you have a consumer group which has 3 consumers at the moment consuming messages from a topic. Assume that you had to shut down all 3 consumers in the consumer group for some reason. Now when you restart the consumers in the consumer group, how does the consumers know from which offset they should read from the topic to avoid reading the same messages all over again which were already read before the consumers went down?

__consumer_offsets topic

By default, Kafka tracks the offsets read by consumers in the consumer group in __consumer_offsets topic. Older versions (prior to 0.9) of Kafka stored offsets in Zookeeper.

So when we restart the consumers in the consumer group, our consumers will do a lookup into __consumer_offsets topic to learn the offset it has read so far and start ingesting messages from the next offset.

Now, let’s just say we have about 1000 messages in a Kafka topic. We then create a brand new consumer group with 2 brand new consumers. Since our consumer group and it’s consumers are brand new __consumer_offsets topic will not have an offset stored. Will the consumers read from message #1 or message #1000?

auto.offset.reset config

auto.offset.reset can be set to earliest or latest or none. In the above situation, if the config is set to earliest. The consumers will start processing messages from message #1. If it is set to latest the consumers will start processing messages from message #1000. 

If the value is set to none, then it is expected that you would rather set the initial offset yourself and you are willing to handle out of range errors manually.

auto.offset.reset only applies when there is no stored offset for the consumer group.

It applies to the following conditions:

  • no offset stored scenario – the first time a consumer group consumes – no offset stored scenario
  • for some reason, if a consumer doesn’t commit any offsets, the next time it is started
  • if a consumer group has been expired (7 days by default with modern brokers)
  • due to message retention policy, if the message the stored offset points to has been removed
Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How does a consumer know the offset to read after restart in Kafka?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X