Rate Limiting

Token bucket, leaky bucket, and the window algorithms — plus how to make a rate limiter correct across a fleet with Redis and an atomic counter.

9 min read Level 3/5 #system-design#rate-limiting#redis

What you'll learn

Compare token bucket, leaky bucket, and window-based limiters
Implement a token-bucket limiter in JavaScript
Make a distributed limiter atomic with Redis

A rate limiter answers one question: given who’s asking and how fast, do we let this request through or reject it? It’s the cheapest insurance you can buy. Without one, a single buggy client, a scraper, or a thundering retry storm can saturate your service and take everyone down with it. With one, abusive traffic hits a wall at the edge while legitimate users keep flowing.

The interesting part isn’t whether to limit — it’s which algorithm, because each one shapes traffic differently and each one fails differently at the edges of a window.

The four classic algorithms

Algorithm	How it works	Allows bursts?	Memory
Fixed window	Count requests per fixed time slot, reset on the boundary	Yes — double at boundaries	1 counter
Sliding window log	Store a timestamp per request, count those in the trailing window	No	O(requests)
Sliding window counter	Weight the previous window’s count into the current one	Smoothed	2 counters
Token bucket	Tokens refill at a steady rate; each request spends one	Yes, up to bucket size	2 numbers
Leaky bucket	Requests queue and drain at a fixed rate	No — smooths output	queue

Fixed window is the simplest — one counter per key per minute — but it has a nasty boundary bug: a client can send a full window’s worth of requests in the last second of one window and another full window in the first second of the next, getting 2× your limit in a two-second span.

Sliding window log fixes that by remembering every request’s timestamp and counting only those inside the trailing 60 seconds — perfectly accurate, but it stores a timestamp per request, which gets expensive under load.

Sliding window counter is the practical compromise: keep the current and previous window counts and interpolate, smoothing the boundary without storing per-request data.

Token bucket is the one you’ll reach for most. A bucket holds up to N tokens; tokens refill at a steady rate r per second; each request removes one token, and an empty bucket means rejection. It permits short bursts (drain the bucket) while enforcing a long-run average (the refill rate) — exactly the shape most APIs want.

Leaky bucket is its mirror: requests enter a queue and leak out at a fixed rate, smoothing bursty input into a steady stream — great when the thing downstream needs an even load, not just a cap.

The JavaScript angle: a token bucket in Node

The token bucket has a beautiful property — you don’t need a background timer ticking to add tokens. You compute how many would have refilled since you last looked, lazily, on each request. That makes it cheap and exact:

A lazy-refill token bucket script.js

class TokenBucket {
  constructor(capacity, refillPerSec) {
    this.capacity = capacity;       // max burst size
    this.refillPerSec = refillPerSec;
    this.tokens = capacity;         // start full
    this.last = Date.now();
  }

  tryRemove(cost = 1) {
    const now = Date.now();
    const elapsed = (now - this.last) / 1000;
    // Lazily refill based on time passed — no timer needed.
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillPerSec);
    this.last = now;

    if (this.tokens >= cost) {
      this.tokens -= cost;
      return true;                  // allowed
    }
    return false;                   // rate limited
  }
}

// 10-request burst, refilling 5/sec → sustained 5 req/s, bursts up to 10.
const bucket = new TokenBucket(10, 5);
console.log(bucket.tryRemove()); // true

▶ Preview: console

That works perfectly — in one process. The moment you run three Node instances behind a load balancer, each has its own bucket, and a client spread across all three gets 3× the intended limit. State that’s local to a process can’t enforce a global limit.

Going distributed with Redis

The fix is to keep the counter in a shared store every instance can see — almost always Redis, because it’s fast and atomic. The naive fixed-window version is two commands:

Atomic fixed-window limit with INCR + EXPIRE script.js

import Redis from 'ioredis';
const redis = new Redis();

async function allow(userId, limit = 100, windowSec = 60) {
  const key = `rl:${userId}:${Math.floor(Date.now() / 1000 / windowSec)}`;
  const count = await redis.incr(key);     // atomic increment
  if (count === 1) await redis.expire(key, windowSec); // set TTL on first hit
  return count <= limit;
}

▶ Preview: console

There’s a subtle bug lurking: if the process dies between INCR and EXPIRE, the key never expires and that user is counted forever. The robust fix is to make the whole check-and-increment a single atomic operation with a Lua script, which Redis runs without interleaving anything else:

Telling the client what happened

A good limiter doesn’t just say no — it tells the caller how to behave. Return 429 Too Many Requests with standard headers so well-behaved clients can back off instead of hammering:

RateLimit-Limit — the ceiling
RateLimit-Remaining — tokens left
Retry-After — seconds until they should try again

That turns a blunt rejection into a contract clients can cooperate with — which is exactly what the next layer formalizes. Where does the limiter actually live? Usually at the API gateway, the front door we look at next.