Outbox vs Persistent Publisher: Choosing the Right One

Work in Progress

This documentation section is still a draft that needs to be reviewed.

When using KafkaEventPublisher, produced events are generated and sent to the Kafka broker in an ephemeral manner: there's no ability to generate the same events again. The Persistent Publisher is a Hexkit provider that solves this problem by storing a representation of events in MongoDB as they are published. This makes the database the de facto backup of Kafka, to an extent. The Persistent Publisher class is called PersistentEventPublisher, located in the MongoKafka subpackage.

The other MongoKafka provider, the Outbox Publisher, solves a different problem. Microservices sometimes need to share data without sharing access to the same database. The Outbox Publisher is essentially a MongoDB DAO retrofitted with a Kafka publisher. It attaches a small amount of metadata to documents as it upserts or deletes data. Upon each modifying operation, the provider run the given document through a user-defined function to produce a Kafka payload which it then publishes. If a document is deleted, the provider merely publishes a tombstone record with the object ID as the key. When another service wants to access the data in question, it subscribes to the Kafka topic housing the outbox events. The Outbox Publisher is called MongoKafkaDaoPublisher, also located in the MongoKafka subpackage.

Both the PersistentEventPublisher and the MongoKafkaDaoPublisher save data to the database, but they are intended for use in different situations, and are not interchangeable.

When to use the Persistent Publisher (PersistentEventPublisher)

The Persistent Publisher should be employed when: - The publisher needs to be used like a normal KafkaEventPublisher. - The data published does not represent a stateful entity, like a domain object. - You want to be able to publish the same Kafka event again. - Same topic, event type, correlation ID, event ID, payload, etc. - Often these are events that initiate a sequence of events, or Kafka flow.

When to use the Outbox Publisher (MongoKafkaDaoPublisher)

The Outbox Publisher should be used when: - The publisher needs to be used like a normal MongoDbDao. - The data published represents the latest state of a domain object - Examples: user data, access requests, order status, etc. - Your service needs to inform other services about changes to a given MongoDB collection. - You're focused on storing data. The Outbox Publisher cannot publish arbitrary events like the Persistent Publisher — it behaves like the MongoDB DAO and publishes events in the background.

Further Details

How the Two Publishers Store Information

The Persistent Publisher stores data according to the Pydantic model defined here:

class PersistentKafkaEvent(BaseModel):
    """A model representing a kafka event to be published and stored in the database."""

    id: str = Field(
        ...,
        description="The unique ID of the event. If the topic is set to be compacted,"
        + " the ID is set to the topic and key in the format <topic>:<key>. Otherwise"
        + " the ID is set to a random UUID.",
    )
    topic: Ascii = Field(..., description="The event topic")
    type_: Ascii = Field(..., description="The event type")
    payload: JsonObject = Field(..., description="The event payload")
    key: Ascii = Field(..., description="The event key")
    headers: Mapping[str, str] = Field(
        default_factory=dict,
        description="Non-standard event headers. Correlation ID and event type are"
        + " transmitted as event headers, but added as such within the publisher"
        + " protocol. The headers here are any additional header that need to be sent.",
    )
    correlation_id: str = Field(..., description="The event correlation ID")
    created: datetime = Field(
        ..., description="The timestamp of when the event record was first inserted"
    )
    published: bool = Field(False, description="Whether the event has been published")

By contrast, the Outbox Publisher saves the domain object like a DAO normally would, but adds a __metadata__ field. If we were saving data from a User class with a user_id, name and email field, the saved data might look like this:

{
  "__metadata__": {
    "correlation_id": <UUID4>,
    "published": true,
    "deleted": false,
  },
  "_id": <UUID4>, // hexkit converts models' ID field names to the MongoDB `_id`
  "name": "John Doe",
  "email": "doe@example.com"
}

Republishing

Both providers expose the methods publish_pending and republish. The former only publishes documents where published is false, while the latter will republish all its documents regardless of publish status. We suggest implementing a method in your service accessible via CLI in order to trigger republishing as a job. In both cases, the provider will use the stored correlation ID when republishing data.