Message Queues and Brokers

Kafka vs RabbitMQ vs SQS, at-least-once vs at-most-once delivery, ordering, consumer groups and partitions, and when to introduce a queue at all.

10 min read Level 4/5 #system-design#message-queue#kafka

What you'll learn

Distinguish a log (Kafka) from a broker (RabbitMQ) and a managed queue (SQS)
Reason about delivery guarantees, ordering, and consumer groups
Decide when adding a queue actually pays off

A message queue sits between a producer and a consumer so they don’t have to be online, fast, or even scaled together at the same time. The producer drops a message and moves on; a consumer picks it up whenever it’s ready. That single indirection buys you decoupling, buffering against spikes, and resilience — if the consumer is down, messages wait instead of being lost.

When to introduce a queue

Don’t add a queue reflexively — it’s a new piece of infrastructure to operate. Reach for one when you see:

Slow work on the request path — sending email, transcoding video, charging a card. Push it off the synchronous path so the user isn’t waiting.
Spiky load — a flash sale produces 50× normal writes for ten minutes. A queue absorbs the spike and lets consumers drain at a steady rate (load leveling), instead of crushing the database.
Fan-out to many consumers — one event (“order placed”) needs to trigger billing, shipping, and analytics independently.
Crossing a reliability boundary — you want a retryable buffer between two services that fail independently.

Message Queues and Brokers — architecture diagram

Two mental models: log vs broker

The biggest conceptual split in this space is log versus broker.

A broker (RabbitMQ, SQS) treats messages as transient work items. A message is delivered to a consumer, acknowledged, and deleted. The broker actively routes and tracks each message’s state. Think of it as a smart to-do list that hands out tasks and crosses them off.

A log (Kafka) is an append-only, ordered, durable sequence of records, split into partitions. Consumers don’t “take” messages — they read forward at their own offset, and the records stay for a retention window (hours, days, or forever). Multiple independent consumers can read the same log at different positions, and you can replay from any offset. Think of it as a durable event ledger rather than a task list.

	Kafka (log)	RabbitMQ (broker)	SQS (managed queue)
Model	Append-only log	Routing broker	Managed broker
After consumption	Retained (replayable)	Deleted on ack	Deleted on delete
Ordering	Per partition	Per queue (best effort)	FIFO queues only
Throughput	Very high	High	High (elastic)
Replay	Yes (by offset)	No	No
Routing logic	Minimal (topic/partition)	Rich (exchanges)	Minimal
Ops burden	High (self-run)	Medium	None (AWS-managed)

A quick way to choose: Kafka when you want a durable, replayable stream that many systems read (event sourcing, analytics, audit). RabbitMQ when you want rich routing and per-message work semantics. SQS when you want a dead-simple managed queue and don’t want to run a broker at all.

Delivery guarantees

No distributed queue can promise true exactly-once delivery (we’ll prove why in the idempotency lesson). What they offer is:

At-most-once — deliver and forget; if the consumer crashes mid-process, the message is lost. Acceptable only when losing a message is fine (e.g., a best-effort metric).
At-least-once — the message is redelivered until the consumer acks success. If the consumer crashes after doing the work but before acking, you get a duplicate. This is the common, safe default.

The practical consequence: design every consumer to be idempotent, because at-least-once means duplicates will happen. The mechanics for that are the next big topic.

Ordering, partitions, and consumer groups

Ordering is only guaranteed within a single partition (Kafka) or a single FIFO queue (SQS). Across partitions, all bets are off. So if order matters — say, all events for one user — you route them to the same partition by a key (e.g., hash of userId). Same key, same partition, ordered.

Consumer groups are how you scale reads while preserving that order. Kafka assigns each partition to exactly one consumer in a group, so you parallelize across partitions without two consumers fighting over the same one. Want more parallelism? Add partitions. Want a second independent reader of everything (analytics alongside billing)? Use a different consumer group — it gets its own offsets and reads the full log.

Backlog and replay

Because a log retains messages, a slow or crashed consumer simply builds lag (its offset falls behind the head) and catches up later — nothing is lost. And because you can rewind the offset, you can replay history: reprocess a day of events after fixing a bug, or bootstrap a brand-new consumer from the beginning. A broker, by contrast, has no replay — once a message is acked and deleted, it’s gone.

The JavaScript angle

From Node, you produce and consume with client libraries: kafkajs for Kafka, amqplib for RabbitMQ, the AWS SDK for SQS. The shape is always the same — a producer publishes, a long-running worker consumes and acks.

Producing and consuming with kafkajs script.js

import { Kafka } from 'kafkajs';

const kafka = new Kafka({ clientId: 'orders', brokers: ['localhost:9092'] });

// Producer: key by userId so a user's events keep order in one partition.
const producer = kafka.producer();
await producer.connect();
await producer.send({
  topic: 'orders',
  messages: [{ key: 'user-42', value: JSON.stringify({ orderId: 'o1' }) }],
});

// Consumer: part of a group, so partitions are split across instances.
const consumer = kafka.consumer({ groupId: 'fulfillment' });
await consumer.connect();
await consumer.subscribe({ topic: 'orders', fromBeginning: false });
await consumer.run({
  eachMessage: async ({ message }) => {
    const order = JSON.parse(message.value.toString());
    await fulfill(order); // make this idempotent — redelivery can happen
  },
});

▶ Preview: console

The queue gives one producer many independent consumers, but it’s still a work-distribution tool: each message goes to one consumer in a group. When you instead want every subscriber to get every message, you want publish/subscribe — next.