Load Balancing

L4 vs L7 load balancing, the core distribution algorithms, health checks, and why sticky sessions are a smell — with a Node app sitting behind the balancer.

9 min read Level 3/5 #system-design#load-balancing#scalability

What you'll learn

Distinguish L4 (transport) from L7 (application) load balancing
Choose an appropriate distribution algorithm for a workload
Explain health checks and why sticky sessions signal hidden state

Once you scale out to many instances, something has to sit in front and decide which instance each request goes to. That something is a load balancer. It’s the single address clients talk to, and behind it sits a pool of interchangeable servers that the client never sees. Done right, ten servers look like one — and the failure of any one is invisible.

L4 vs L7: how deep does it look?

Load balancers operate at one of two layers of the network stack, and the choice shapes what they can do.

Layer 4 (transport). An L4 balancer routes based on TCP/UDP information — IP addresses and ports — without ever looking at the request contents. It can’t see the URL or headers because, for HTTPS, it may not even decrypt the traffic. It just forwards packets/connections. This makes it extremely fast and protocol-agnostic, but dumb: it can’t route /api to one pool and /images to another.

Layer 7 (application). An L7 balancer terminates the connection and reads the actual HTTP request — method, path, headers, cookies. That visibility lets it do smart things: route by URL, split traffic by header, do TLS termination, retry failed requests, even rewrite responses. The cost is more CPU per request and the need to decrypt traffic.

	L4 (transport)	L7 (application)
Sees	IP, port, TCP/UDP	Full HTTP: path, headers, cookies
Speed	Very fast, low overhead	Slower — parses & often decrypts
Routing	By connection	By URL, header, cookie, content
TLS	Pass-through	Can terminate TLS
Examples	AWS NLB, IPVS	AWS ALB, nginx, HAProxy, Envoy

In practice most app traffic goes through an L7 balancer because the smart routing is worth it. Reach for L4 when you need raw throughput, non-HTTP protocols, or end-to-end encryption with no decryption in the middle.

Distribution algorithms

Given a healthy pool, how does the balancer pick a server?

Round-robin. Hand requests out in rotation: 1, 2, 3, 1, 2, 3… Simple and fair when all servers and all requests are roughly equal.
Weighted round-robin. Give beefier servers a bigger share — a 16-core box gets weight 4, a 4-core box gets weight 1. Useful for heterogeneous pools.
Least connections. Send the next request to the server with the fewest active connections. Better than round-robin when request durations vary wildly (some requests hold a connection for seconds, others for milliseconds).
IP hash / consistent hashing. Hash a key (often the client IP) to pick a server, so the same client lands on the same server. Useful for cache locality — but a form of stickiness, which we’ll critique below. Consistent hashing (a later lesson) minimizes reshuffling when servers join or leave.

Health checks: don’t route into a black hole

A pool member is only useful if it’s healthy, so the balancer continuously probes each instance and removes the sick ones from rotation. There are two flavors:

Passive — watch real traffic; if a server starts returning errors or timing out, eject it.
Active — periodically hit a dedicated endpoint (e.g. GET /healthz) and require a 200. This is the one you design for.

This is exactly the honest health endpoint from the availability lesson: it should report unhealthy when a critical dependency is unreachable, so the balancer stops sending traffic to a server that can’t actually do its job.

A health endpoint built for the load balancer script.js

import express from 'express';
const app = express();

// Active health check target. The LB polls this every few seconds.
app.get('/healthz', async (req, res) => {
  try {
    await db.ping();                 // critical dependency reachable?
    await redis.ping();
    res.status(200).json({ ok: true, pid: process.pid });
  } catch (err) {
    // 503 → the LB pulls this instance out of rotation, fast.
    res.status(503).json({ ok: false, error: err.message });
  }
});

// Drain gracefully on shutdown so in-flight requests finish
// and the LB has time to notice we're going away.
process.on('SIGTERM', () => {
  console.log('Draining…');
  server.close(() => process.exit(0));
});

const server = app.listen(3000);

▶ Preview: console

Sticky sessions, and why they’re a smell

A sticky session (session affinity) pins a given client to a specific server for the life of their session — usually via a cookie the balancer sets or by hashing the client IP. The pitch is “the server already has this user’s session in memory, so keep sending them back.”

That pitch is the problem. Stickiness is a band-aid over server-side state. It quietly undoes the benefits of scaling out:

Uneven load. Long-lived sticky clients pile onto whichever servers they first hit; the balancer can’t rebalance them.
Brittle failover. If a server dies, every client stuck to it loses their session — the exact failure redundancy was supposed to hide.
Painful deploys. You can’t cleanly drain and replace a server without disrupting the clients pinned to it.

The fix isn’t a smarter stickiness scheme — it’s removing the state that made stickiness necessary. Put sessions in a shared store (Redis), and any server can serve any request. Then round-robin freely, fail over cleanly, and deploy without fear. Stickiness should be a deliberate, rare optimization (e.g. for cache warmth), never the thing holding your sessions together.

The JavaScript angle: a Node pool behind the LB

For a Node service the balancer’s job is to spread requests across your worker processes and instances. Three things make Node play nicely behind one:

Be stateless. Sessions, carts, and “logged-in user” data live in Redis or a DB — not in a module-level variable. Then no stickiness is required.
Expose an honest /healthz. As above — fail it when a critical dependency is down so the LB ejects you instead of routing into errors.
Drain on SIGTERM. Stop accepting new connections, finish in-flight requests, then exit — so rolling deploys don’t drop requests.

Get those three right and your Node fleet behaves like one big, resilient server. The load balancer routes; the next box in the chain often does much more than route. That’s the reverse proxy — next.