Message Queues and Brokers

Decouple Producers From Consumers and Survive the Spikes

Message Queues and Brokers

Kafka vs RabbitMQ vs SQS, at-least-once vs at-most-once delivery, ordering, consumer groups and partitions, and when to introduce a queue at all.

10 min read Level 4/5 #system-design#message-queue#kafka
What you'll learn
  • Distinguish a log (Kafka) from a broker (RabbitMQ) and a managed queue (SQS)
  • Reason about delivery guarantees, ordering, and consumer groups
  • Decide when adding a queue actually pays off

A message queue sits between a producer and a consumer so they don’t have to be online, fast, or even scaled together at the same time. The producer drops a message and moves on; a consumer picks it up whenever it’s ready. That single indirection buys you decoupling, buffering against spikes, and resilience — if the consumer is down, messages wait instead of being lost.

When to introduce a queue

Don’t add a queue reflexively — it’s a new piece of infrastructure to operate. Reach for one when you see:

  • Slow work on the request path — sending email, transcoding video, charging a card. Push it off the synchronous path so the user isn’t waiting.
  • Spiky load — a flash sale produces 50× normal writes for ten minutes. A queue absorbs the spike and lets consumers drain at a steady rate (load leveling), instead of crushing the database.
  • Fan-out to many consumers — one event (“order placed”) needs to trigger billing, shipping, and analytics independently.
  • Crossing a reliability boundary — you want a retryable buffer between two services that fail independently.
Message Queues and Brokers — architecture diagram

Two mental models: log vs broker

The biggest conceptual split in this space is log versus broker.

A broker (RabbitMQ, SQS) treats messages as transient work items. A message is delivered to a consumer, acknowledged, and deleted. The broker actively routes and tracks each message’s state. Think of it as a smart to-do list that hands out tasks and crosses them off.

A log (Kafka) is an append-only, ordered, durable sequence of records, split into partitions. Consumers don’t “take” messages — they read forward at their own offset, and the records stay for a retention window (hours, days, or forever). Multiple independent consumers can read the same log at different positions, and you can replay from any offset. Think of it as a durable event ledger rather than a task list.

Kafka (log)RabbitMQ (broker)SQS (managed queue)
ModelAppend-only logRouting brokerManaged broker
After consumptionRetained (replayable)Deleted on ackDeleted on delete
OrderingPer partitionPer queue (best effort)FIFO queues only
ThroughputVery highHighHigh (elastic)
ReplayYes (by offset)NoNo
Routing logicMinimal (topic/partition)Rich (exchanges)Minimal
Ops burdenHigh (self-run)MediumNone (AWS-managed)

A quick way to choose: Kafka when you want a durable, replayable stream that many systems read (event sourcing, analytics, audit). RabbitMQ when you want rich routing and per-message work semantics. SQS when you want a dead-simple managed queue and don’t want to run a broker at all.

Delivery guarantees

No distributed queue can promise true exactly-once delivery (we’ll prove why in the idempotency lesson). What they offer is:

  • At-most-once — deliver and forget; if the consumer crashes mid-process, the message is lost. Acceptable only when losing a message is fine (e.g., a best-effort metric).
  • At-least-once — the message is redelivered until the consumer acks success. If the consumer crashes after doing the work but before acking, you get a duplicate. This is the common, safe default.

The practical consequence: design every consumer to be idempotent, because at-least-once means duplicates will happen. The mechanics for that are the next big topic.

Ordering, partitions, and consumer groups

Ordering is only guaranteed within a single partition (Kafka) or a single FIFO queue (SQS). Across partitions, all bets are off. So if order matters — say, all events for one user — you route them to the same partition by a key (e.g., hash of userId). Same key, same partition, ordered.

Consumer groups are how you scale reads while preserving that order. Kafka assigns each partition to exactly one consumer in a group, so you parallelize across partitions without two consumers fighting over the same one. Want more parallelism? Add partitions. Want a second independent reader of everything (analytics alongside billing)? Use a different consumer group — it gets its own offsets and reads the full log.

Message Queues and Brokers — architecture diagram

Backlog and replay

Because a log retains messages, a slow or crashed consumer simply builds lag (its offset falls behind the head) and catches up later — nothing is lost. And because you can rewind the offset, you can replay history: reprocess a day of events after fixing a bug, or bootstrap a brand-new consumer from the beginning. A broker, by contrast, has no replay — once a message is acked and deleted, it’s gone.

The JavaScript angle

From Node, you produce and consume with client libraries: kafkajs for Kafka, amqplib for RabbitMQ, the AWS SDK for SQS. The shape is always the same — a producer publishes, a long-running worker consumes and acks.

Producing and consuming with kafkajs script.js
import { Kafka } from 'kafkajs';

const kafka = new Kafka({ clientId: 'orders', brokers: ['localhost:9092'] });

// Producer: key by userId so a user's events keep order in one partition.
const producer = kafka.producer();
await producer.connect();
await producer.send({
  topic: 'orders',
  messages: [{ key: 'user-42', value: JSON.stringify({ orderId: 'o1' }) }],
});

// Consumer: part of a group, so partitions are split across instances.
const consumer = kafka.consumer({ groupId: 'fulfillment' });
await consumer.connect();
await consumer.subscribe({ topic: 'orders', fromBeginning: false });
await consumer.run({
  eachMessage: async ({ message }) => {
    const order = JSON.parse(message.value.toString());
    await fulfill(order); // make this idempotent — redelivery can happen
  },
});
▶ Preview: console

The queue gives one producer many independent consumers, but it’s still a work-distribution tool: each message goes to one consumer in a group. When you instead want every subscriber to get every message, you want publish/subscribe — next.