How to fix unassigned shards issue in Elasticsearch?
July 10, 2023What is the difference between client and cluster deploy modes in Spark?
July 24, 2023In modern software systems, data is often generated and consumed in real-time. To handle these data streams, various processing techniques have been developed, including stream processing and message processing. However, there is often confusion about the difference between these two techniques. In this blog post, we will explore the differences between stream processing and message processing.
Once we understand the difference, we will see where Kafka and other messaging systems like RabbitMQ fit.
Stream processing and message processing are both techniques used to handle real-time data streams. However, they differ in their approach and purpose.
Message Processing
Message processing involves receiving and processing messages from a message queue or a pub/sub system. Message processing is commonly used to integrate different systems or components, and it provides a decoupling mechanism that enables different systems to work together without being tightly coupled.
We apply simple computations on the messages — in most cases individually per message.
Eg. RabbitMQ
Stream Processing
Stream processing is the technique of processing real-time data streams as they occur. It involves continuously processing and analyzing data in real-time as it flows through a system. In stream processing applications or platforms, we can apply complex operations on multiple input streams and multiple records or messages at the same time performing complex operations on messages like aggregations and joins.
Eg. Kafka
Kafka vs. RabbitMQ
RabbitMQ is a message processing platform. Producer(s) ingest messages into RabbitMQ. Consumer(s) pick up messages, process them and messages get removed from RabbitMQ once all the consumers consume the message.
Kafka is a message processing platform at its core but it is also a stream processing platform as well. Typical messaging systems do not have the ability to “rewind” and access previously delivered messages, as they are automatically deleted once all subscribed consumers have received them. In contrast, Kafka uses a pull-based model, in which consumers retrieve data from Kafka, and retain messages for a configurable period of time. As a result, Kafka has the ability to store and retrieve messages that have already been sent, even after they have been consumed by subscribers.
In addition to above, Kafka streams allow us to apply complex operations like aggregation, joins, window and other analytic operations on real-time streaming data.