Tuesday, September 05, 2017

Pub-Sub the swiss army knife (tech pill)

Pub-Sub / Publish-Subscribe

"In software architecturepublish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers, but instead categorize published messages into classes without knowledge of which subscribers, if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are." wikipedia Publish-Subscribe pattern
As the Wikipedia explains, the pub-sub patterns allow decoupling the publishers (P) from the possible subscribers or consumers (C). So the publisher doesn't need to have any knowledge about the components interested in receiving the message.
To implement this pattern we need a central service in charge to receive the messages from the publisher and resend the message for each one of the interested subscribers. Usually, this central service is called broker (B).

Some common characteristics for a broker are:
  • Allow broadcasting message to several consumers.
  • Allow the consumers to configure in which messages are interested using some kind of consumer defined rule.
    • By topic
    • By regular expression over a topic 
    • By a query over some attributes of the message
  • Allow the publisher to send messages and add some meta info to it
    • Attributes
    • A destination topic
    • Other meta info as priority, TTL, etc
  • Allow several consumers to receive the same message
Another typical characteristic, and in fact, the most important one is that for each consumer the broker usually has a queue to maintain the pending messages.
And this queue has all the characteristics that we explain in the Queues vs Distributed log post... For example, the broker can load balance the messages between a group of consumers.

In summary, the main characteristics of a pub-sub system are:
  • Publishers:
    • They don't know the final destination of the messages (decoupling).
    • They send the messages to a topic/exchange/abstract destination.
    • They can add attributes to the messages.
  • Consumers/Subscribers:
    • Each one informs the broker the in which messages are interested (using the name of the topic, a regular expression over the topic or a combination of attributes of the message).
    • Each message can be consumed by several consumers (broadcasting).


  • Great decoupling between publishers/consumers.
  • Allow easy creation of flexible communication topologies.
  • All the pros of Queues.
    • Easy to implement.
    • Unlimited/Easy horizontal scalability.
    • Fault tolerance is very easy to implement (time out and re-queue of the message).
    • Allow easy balance between latency (time at the queue + processing time) and the cost of the concurrent workers.

Cons (same as queues):

  • The order is not guaranteed.
  • Usually, we can have duplicates.
  • Requires more resources from the broker some in some scenarios it has worst scalability than other solutions (not important for the majority of the cases).

Use a Pub-Sub system when:

For any use case that requires flexibility, broadcasting of messages and doesn't require that order of the messages is guaranteed. In these scenarios, a pub-sub system is a great solution because it includes all the use cases of a queue and all the flexibility of sending the same message to several queues/consumers.

The real potential of this kind of solution is when you combine several brokers (using federation or replication)  to create flexible topologies that can communicate several systems and services.

Use cases / Examples:

  • Any good scenario for queues.
  • Async processing of requests.
  • When we can allow losing some data (or having some delays), Pub-Sub systems are great for:
    • Log / Monitoring info distribution and processing.
    • Asynchronous processing of email requests.
    • Processing of independent batch jobs.
  • Using Federation
    • Info replication and distribution between data centers.
    • Global distribution of periodic information.
    • Global distribution of reference info.


Related content:

1 comment:

Unknown said...

nice post