What is the default number of executors in Spark?
September 25, 2023How to kill a running Spark application?
October 9, 2023Let’s say you have a consumer group which has 3 consumers at the moment consuming messages from a topic. Assume that you had to shut down all 3 consumers in the consumer group for some reason. Now when you restart the consumers in the consumer group, how does the consumers know from which offset they should read from the topic to avoid reading the same messages all over again which were already read before the consumers went down?
__consumer_offsets topic
By default, Kafka tracks the offsets read by consumers in the consumer group in __consumer_offsets topic. Older versions (prior to 0.9) of Kafka stored offsets in Zookeeper.
So when we restart the consumers in the consumer group, our consumers will do a lookup into __consumer_offsets topic to learn the offset it has read so far and start ingesting messages from the next offset.
Now, let’s just say we have about 1000 messages in a Kafka topic. We then create a brand new consumer group with 2 brand new consumers. Since our consumer group and it’s consumers are brand new __consumer_offsets topic will not have an offset stored. Will the consumers read from message #1 or message #1000?
auto.offset.reset config
auto.offset.reset can be set to earliest or latest or none. In the above situation, if the config is set to earliest. The consumers will start processing messages from message #1. If it is set to latest the consumers will start processing messages from message #1000.
If the value is set to none, then it is expected that you would rather set the initial offset yourself and you are willing to handle out of range errors manually.
auto.offset.reset only applies when there is no stored offset for the consumer group.
It applies to the following conditions:
- no offset stored scenario – the first time a consumer group consumes – no offset stored scenario
- for some reason, if a consumer doesn’t commit any offsets, the next time it is started
- if a consumer group has been expired (7 days by default with modern brokers)
- due to message retention policy, if the message the stored offset points to has been removed