WebSockets at Scale in Node

Why stateful socket connections break naive load balancing, how sticky sessions help, and how a Redis pub/sub adapter fans messages out across instances.

10 min read Level 4/5 #system-design#websockets#socketio

What you'll learn

Explain why in-memory socket state defeats horizontal scaling
Apply sticky sessions to keep a client pinned to one instance
Use a Redis pub/sub adapter to fan messages out across all instances

WebSockets are easy to start and hard to scale — and the reason is the most important sentence in this whole section: a WebSocket connection lives in the memory of one specific server process. Everything difficult about real-time at scale flows from that single fact. This is the flagship Node lesson; we’ll build the problem up and then solve it.

The in-memory connection problem

A single Node process can comfortably hold tens of thousands of open sockets. When a client connects, you keep a reference to its socket in memory so you can push to it later:

The naive single-server chat — works until you add a second box script.js

import { WebSocketServer } from 'ws';

const wss = new WebSocketServer({ port: 8080 });
const clients = new Set();           // ⚠️ lives in THIS process's memory only

wss.on('connection', (socket) => {
  clients.add(socket);
  socket.on('message', (data) => {
    // Broadcast to everyone... connected to THIS instance.
    for (const c of clients) c.send(data);
  });
  socket.on('close', () => clients.delete(socket));
});

▶ Preview: console

This is correct on one server and silently broken on two. If Alice’s socket lives on instance A and Bob’s lives on instance B, Alice’s message broadcasts to the clients set on A — which doesn’t contain Bob. The room is split across processes that can’t see each other’s connections.

Why you can’t naively load-balance stateful sockets

Stateless HTTP scales horizontally because any server can handle any request — the load balancer sprays requests across the fleet and nobody cares which box answers. WebSockets break that assumption twice:

The connection is pinned. Once the Upgrade handshake completes, that client is bound to that one process for the life of the socket. The LB can’t move an open connection to a less-busy box.
The state is local. As we just saw, the set of who’s-connected-where lives in per-process memory. No single instance has the full picture.

WebSockets at Scale in Node — architecture diagram

So we have two distinct problems: (1) getting a client’s handshake to land on a server that can keep it, and (2) getting a message from a socket on A out to sockets on B and C. They have two different solutions.

Problem 1: sticky sessions

The handshake problem is solved at the load balancer with sticky sessions (aka session affinity): the LB hashes the client (by IP or a cookie) so every request from that client — including the long-lived Upgrade — routes to the same instance. This matters even with a single LB because libraries like Socket.IO may make a couple of HTTP polling requests during the initial handshake before upgrading to WebSocket; if those land on different instances, the handshake fails.

Problem 2: fan-out with a Redis pub/sub adapter

To get a message from a socket on instance A to sockets on B and C, you need a shared message bus the instances all subscribe to. Redis pub/sub is the canonical choice, and Socket.IO ships an adapter that wires it up for you.

The idea: when any instance emits to a room, it publishes that emit to Redis; every instance is subscribed, receives the message, and re-emits it to the matching local sockets. No instance needs to know where any socket physically lives — Redis is the meeting point.

In Socket.IO this is a few lines: attach the Redis adapter and keep writing code as if you had one server. The adapter intercepts every cross-instance emit.

Socket.IO with the Redis adapter — emit reaches every instance script.js

import { createServer } from 'node:http';
import { Server } from 'socket.io';
import { createAdapter } from '@socket.io/redis-adapter';
import { createClient } from 'redis';

const httpServer = createServer();
const io = new Server(httpServer);

// One pub + one sub Redis connection, shared by every app instance.
const pubClient = createClient({ url: 'redis://localhost:6379' });
const subClient = pubClient.duplicate();
await Promise.all([pubClient.connect(), subClient.connect()]);

// This is the whole fix: emits now fan out across ALL instances.
io.adapter(createAdapter(pubClient, subClient));

io.on('connection', (socket) => {
  socket.on('join', (room) => socket.join(room));
  socket.on('chat', (room, msg) => {
    // Reaches every socket in `room`, even ones on OTHER instances.
    io.to(room).emit('chat', msg);
  });
});

httpServer.listen(3000);

▶ Preview: console

The payoff: your application code is identical to the single-server version — io.to(room).emit(...) — but it now correctly reaches Bob on instance B. The adapter handles the publish/subscribe plumbing; you handle the chat logic.

Putting it together

A production WebSocket tier therefore needs three pieces working in concert:

Piece	Problem it solves	Typical tool
Sticky sessions	Keep a client’s connection on one instance	LB affinity (cookie/IP hash)
Redis pub/sub adapter	Fan a message out to sockets on all instances	`@socket.io/redis-adapter`
Horizontal app tier	Hold more concurrent sockets	N stateless Node instances

With those three, you scale real-time the same way you scale anything else: add instances behind the LB. The sockets stay sticky, Redis glues the instances together, and emit Just Works fleet-wide.

We solved live fan-out, but we explicitly skipped durability — what if the recipient is offline, or the work triggered by a message must not be lost? That’s the job of message queues, next.