Queues vs Distributed LogsThis blog post tries to explain the typical use for Queues and for Distributed Log, but of course, a system usually uses these solutions in combination or in other ways. But I will try to summarize the usual use because some times I see some confusion about how we can use these interesting solutions.
Use a Queue when:
- You want to distribute workload between workers for the same feature.
- Each message can represent a job and don't require knowledge about other messages at the queue.
- Easy to implement.
- Unlimited/Easy horizontal scalability.
- Fault tolerance is very easy to implement (time out and re-queue of the message).
- Allow easy balance between latency (time at the queue + processing time) and the cost of the concurrent workers.
- The order is not guaranteed.
- Each message can only be consumed by one worker for one feature.
- Usually, we can have duplicates.
- Asynchronous processing of email requests.
- Processing of independent batch jobs.
Alternatives and related variations
- Queues with guaranteed order (Ex: SQS FIFO).
- TTL for messages in the Queue.
- Queues with a max amount of queued messages.
- AWS SQS, RabbitMQ
Use a Distributed Log when:
- You want to execute several functionalities for the same stream of ordered data (log aggregation, statistics calculation, event sourcing, store streams of information, etc.)
- You want that the data corresponding to the same partition key is processed in order by the same worker instance.
- Allow several functionalities for the same data. Each functionality knows what is his own offset.
- If two or more functionalities are at the same offset, they can be sure that have seen the same messages in the same order.
- Having a guaranteed order, at least at the partition key level, allow design simple solution for calculating statistics, event sourcing, near real-time calculation, etc.
- You can add new functionalities that use/process the same data with any change in the actual system (Open/Closed principle: open for extension but closed for modification).
- Usually, the order of the messages is preserved for each partition key only.
- Scalability using sharding/partitioning.
- More difficult than queues.
- Difficult to have balanced shards/partitions (hot partition problem).
- Log aggregation / distribution / statistics calculation from the logs.
- Event streaming to allow event sourcing.
- AWS Kinesis, Kafka