Managing Distributed Transactions with the Saga Pattern

In a microservices architecture, we may adopt a database-per-microservice approach to let each domain service us a data store that best serves the type of data that microservices uses. With a database-per-microservice approach, we can scale out our data stores independently, and should our data store fail, that failure will be isolated from other services.

However, this approach gets complicated when we need to perform operations in a transactional manner. Transactions must be ACID (atomic, consistent, isolated and durable). Now within a single service, this isn’t too challenging. Across multiple microservices, this becomes difficult to manage. Distributed transactions require all services in a transaction to commit or roll back before the transaction can proceed. Not all data stores support this model.

Interprocess communication does allow separate processes to share data, but all microservices would have to be available for transactions to commit. A better approach would be to implement the Saga pattern, which can help us manage data consistency across microservices in distributed transaction scenarios.

A saga is a sequence of transactions that updates each service and publishes a message or event to trigger the next step in the transaction. Should a step fail, the sage will perform a compensating transaction that rollback the preceding transactions that succeeded.

In this article, I’ll do a quick refresh on what ACID transactions mean, before stepping into the weeds of the Saga pattern, what approaches we can take to implement the Saga pattern, things we need to keep in mind when implementing the Saga pattern, and when we should use the Saga pattern.

Remind me, what are ACID transactions again?

Transactions are a single unit of work that can be made up of multiple operations. Within a transaction, events change state on entities, and commands capture all the information required to perform an action on an entity.

Transactions must be ACID. In a microservices architecture, ACID means:

Atomicity is a set of operations that must occur together or none at all.
Consistency means that the transaction takes data from one valid state to another.
Isolation guarantees that concurrent transactions would produce the same data state that transactions executed sequentially would have produced.
Durability ensures that transactions that are committed remain that way when our systems fails.

What is the Saga Pattern?

The Saga pattern uses a sequence of local transactions (transactions within a service). Each of these local transactions will update that service data store, and then sends a message or event to trigger the next local transaction in the saga.

If a local transaction should fail, the service within a saga will perform compensating transactions that undo changes that were made in the previous service.

Compensating transactions are transactions that can be potentially reversed by processing another transaction with the opposite effect. We can also implement pivot transactions, which if they commit, the saga will run until the process is complete. This transaction cannot be retried or be compensated, or it can be the LAST compensating transaction or the first retryable one in the saga. Retryable transactions follow pivot transactions, and are guaranteed to succeed.

There are two ways that we can implement the Saga pattern: Choreography or Orchestration.

Approach 1: Choreography

In this approach, we coordinate our sagas where they exchange events without a single centralized point of control. Each local transaction publishes domain events that trigger local transactions in other services.

This approach is great for simple workflows that only have a few services in the Saga and they don’t need to be coordinated. There’s no single point of failure, since responsibilities are distributed across the saga, and this implementation doesn’t need additional service implementation or maintenance.

However, as you grow your architecture, the complexity increases as it will be difficult to track with Saga participants need to listen to which commands. Each service in the saga may become dependent on each other cyclically since they need to consumer each other’s commands. Integration testing becomes a nightmare, as all services will need to be running to simulate a transaction.

Approach 2: Orchestration

The alternative approach is to coordinate your sagas, so that a central controller tells all services in the sage what local transactions they need to execute.

The orchestrator handles all transactions and tells every service in the sage which operation they need to perform based on events, while interprets the state of each task, as well as handling failures with compensating transactions.

This approach is great for when our workflows have numerous services in the saga, or we know ahead of time that more services will be added. You remove the cyclical dependencies that you can potentially suffer in Choreography, since the orchestrator depends on all participants in the saga. Services in the saga also don’t need to know about commands for other services, providing you with clear separation of concerns.

However, this does introduce complexity, since this implementation requires coordination, and there’s an additional point of failure, since the orchestrator manages the workflow.

What do we need to keep in mind when implementing the Saga pattern?

Initially, the Saga pattern is a bit of a challenge to implement. Transactions are not local, they’re distributed, which can be a pain to coordinate and manage (Coincidentally, this was the first pattern I was introduced to coming out of university. It was a nightmare!)

This pattern is also a pain to debug and test, as the more services you introduce, the greater the complexity. Data can’t be rolled back in this pattern either, since services commit changes to their local database.

As with all microservice architectures, you need to be able to handle transient failures, and idempotency is important to handle data consistency.

Your saga can potentially be made up of several services, for please implement a method to observe all your services and the ability to track the workflow of the saga!

This pattern could also introduces challenges around data durability. Lost updates, dirty reads, and non-repeatable reads could all occur within a Saga. You may need to implement semantic locks, pessimistic concurrency, versioning, commutative updates etc. to reduce the effect that anomalies may introduce.

When should we use the Saga pattern?

If you need to ensure data consistency in a microservice architecture, or if you need to roll back or compensate with your microservices, the Saga pattern can provide you with the ability to do both.

However, if you have cyclic dependencies between your microservices, tightly coupled transactions or compensating transactions that occur in earlier services within your Saga workflow, you should think of alternatives.

Conclusion

In this article, we discussed the Saga pattern, the two ways that we can implement the Saga pattern, what we need to keep in mind when implementing the Saga pattern, and when we should and shouldn’t use the Saga pattern.

If you want to read more about this pattern, check out the following resources:

If you have any questions, feel free to reach out to me on X/Twitter @willvelida

Until next time, Happy coding! 🤓🖥️

Remind me, what are ACID transactions again?#

What is the Saga Pattern?#

Approach 1: Choreography#

Approach 2: Orchestration#

What do we need to keep in mind when implementing the Saga pattern?#

When should we use the Saga pattern?#

Conclusion#