Atomicity Across Services Is a Different Animal
Distributed Transactions
Why cross-service atomicity is hard, two-phase commit and its blocking problem, the Saga pattern, and the transactional outbox.
What you'll learn
- Explain why cross-service atomicity can't use a local transaction
- Describe 2PC and its blocking failure mode
- Apply the Saga and transactional-outbox patterns
A BEGIN … COMMIT gives you atomicity inside one database. But the moment a
business operation spans multiple services or databases — place an order
and charge a card and reserve inventory, each owned by a different service —
that local transaction is useless. There’s no shared COMMIT across machines you
don’t control. Getting “all of it happens or none of it does” across a network is
one of the genuinely hard problems in system design.
Why it’s hard
Three forces conspire:
- No shared lock manager. Each service has its own database; nothing can hold a transaction open across all of them.
- The network fails mid-operation. Service A succeeds, then the call to service B times out — did B happen or not? You can’t tell from a timeout.
- Partial failure is the default. With
Nservices, the probability that all succeed shrinks, and you must define what happens to the ones that already committed when a later one fails.
Two-phase commit (2PC)
The textbook answer is two-phase commit, coordinated by a central coordinator:
- Prepare phase — the coordinator asks every participant “can you commit?” Each does the work, locks the rows, and votes yes (promising it can commit) or no.
- Commit phase — if all voted yes, the coordinator tells everyone to commit; if any voted no, it tells everyone to abort.
It gives true atomicity — but at a brutal cost: it blocks. Between voting yes and hearing the verdict, participants hold their locks. If the coordinator crashes at that moment, participants are stuck holding locks indefinitely, unable to commit or abort. That blocking, plus the latency of two synchronous round trips and the coordinator as a bottleneck, is why 2PC is largely avoided in high-scale microservice systems.
The Saga pattern
The pragmatic alternative: give up on a single atomic transaction and instead model the operation as a sequence of local transactions, each with a compensating action that undoes it. If step 4 fails, you run the compensations for steps 3, 2, 1 in reverse — a semantic rollback, not a database one.
A saga is eventually consistent (there’s a window where some steps have run and others haven’t) but it’s non-blocking and highly available — the BASE philosophy applied to a workflow. Two coordination styles:
Choreography — no central brain. Each service emits an event; the next service listens and reacts. Decentralized and loosely coupled, but the overall flow is implicit and hard to follow as it grows.
Orchestration — a central orchestrator explicitly tells each service what to do next and triggers compensations on failure. The flow lives in one place (easier to reason about and debug) at the cost of a coordinating component.
| 2PC | Saga | |
|---|---|---|
| Atomicity | true, immediate | eventual (via compensation) |
| Blocking | yes (holds locks) | no |
| Availability | lower | higher |
| Complexity | coordinator + locks | compensations to design |
| Fits | few coupled DBs | microservices at scale |
The transactional outbox
Sagas lean on events (“order created,” “payment succeeded”), which surfaces a nasty dual-write problem: you must update your database and publish an event — two systems, no shared transaction. If you write the DB then crash before publishing, the rest of the saga never fires; publish first then crash, and you announce something that didn’t happen.
The transactional outbox solves it elegantly: write the event into an
outbox table in the same local transaction as your business data. Now the
data change and the intent-to-publish commit atomically. A separate relay process
then reads the outbox and publishes to the message broker, marking rows sent.
// One local ACID transaction covers BOTH the order and the event-to-publish.
// No dual write — they commit together or not at all.
async function placeOrder(pool, order) {
const db = await pool.connect();
try {
await db.query('BEGIN');
await db.query('INSERT INTO orders (id, item, total) VALUES ($1, $2, $3)',
[order.id, order.item, order.total]);
await db.query(
'INSERT INTO outbox (topic, payload) VALUES ($1, $2)',
['order.created', JSON.stringify(order)], // event lives in the same tx
);
await db.query('COMMIT');
} catch (err) {
await db.query('ROLLBACK');
throw err;
} finally {
db.release();
}
}
// A separate relay (poller or CDC) ships outbox rows to the broker, at least once.
async function relay(pool, broker) {
const { rows } = await pool.query(
"SELECT id, topic, payload FROM outbox WHERE sent = false ORDER BY id LIMIT 100");
for (const row of rows) {
await broker.publish(row.topic, row.payload); // saga continues here
await pool.query('UPDATE outbox SET sent = true WHERE id = $1', [row.id]);
}
} The JavaScript angle
This pattern is squarely in Node’s wheelhouse: the relay is a tiny worker, and
the broker is the same BullMQ/Kafka layer the comms section covers. Because the
relay publishes at least once (it may crash after publishing but before
marking sent), downstream consumers must be idempotent — handling a
duplicate event harmlessly. Idempotency keys get their own lesson later; just
note the outbox requires them.
We’ve kept everything inside databases so far. The last piece of the data layer is what to do with the data that doesn’t belong in a database at all — large files and media, which go to object storage.