Decide Who Gets Through When Everyone Shows Up at Once
Rate Limiting
Token bucket, leaky bucket, and the window algorithms — plus how to make a rate limiter correct across a fleet with Redis and an atomic counter.
What you'll learn
- Compare token bucket, leaky bucket, and window-based limiters
- Implement a token-bucket limiter in JavaScript
- Make a distributed limiter atomic with Redis
A rate limiter answers one question: given who’s asking and how fast, do we let this request through or reject it? It’s the cheapest insurance you can buy. Without one, a single buggy client, a scraper, or a thundering retry storm can saturate your service and take everyone down with it. With one, abusive traffic hits a wall at the edge while legitimate users keep flowing.
The interesting part isn’t whether to limit — it’s which algorithm, because each one shapes traffic differently and each one fails differently at the edges of a window.
The four classic algorithms
| Algorithm | How it works | Allows bursts? | Memory |
|---|---|---|---|
| Fixed window | Count requests per fixed time slot, reset on the boundary | Yes — double at boundaries | 1 counter |
| Sliding window log | Store a timestamp per request, count those in the trailing window | No | O(requests) |
| Sliding window counter | Weight the previous window’s count into the current one | Smoothed | 2 counters |
| Token bucket | Tokens refill at a steady rate; each request spends one | Yes, up to bucket size | 2 numbers |
| Leaky bucket | Requests queue and drain at a fixed rate | No — smooths output | queue |
Fixed window is the simplest — one counter per key per minute — but it has a nasty boundary bug: a client can send a full window’s worth of requests in the last second of one window and another full window in the first second of the next, getting 2× your limit in a two-second span.
Sliding window log fixes that by remembering every request’s timestamp and counting only those inside the trailing 60 seconds — perfectly accurate, but it stores a timestamp per request, which gets expensive under load.
Sliding window counter is the practical compromise: keep the current and previous window counts and interpolate, smoothing the boundary without storing per-request data.
Token bucket is the one you’ll reach for most. A bucket holds up to N
tokens; tokens refill at a steady rate r per second; each request removes one
token, and an empty bucket means rejection. It permits short bursts (drain the
bucket) while enforcing a long-run average (the refill rate) — exactly the shape
most APIs want.
Leaky bucket is its mirror: requests enter a queue and leak out at a fixed rate, smoothing bursty input into a steady stream — great when the thing downstream needs an even load, not just a cap.
The JavaScript angle: a token bucket in Node
The token bucket has a beautiful property — you don’t need a background timer ticking to add tokens. You compute how many would have refilled since you last looked, lazily, on each request. That makes it cheap and exact:
class TokenBucket {
constructor(capacity, refillPerSec) {
this.capacity = capacity; // max burst size
this.refillPerSec = refillPerSec;
this.tokens = capacity; // start full
this.last = Date.now();
}
tryRemove(cost = 1) {
const now = Date.now();
const elapsed = (now - this.last) / 1000;
// Lazily refill based on time passed — no timer needed.
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillPerSec);
this.last = now;
if (this.tokens >= cost) {
this.tokens -= cost;
return true; // allowed
}
return false; // rate limited
}
}
// 10-request burst, refilling 5/sec → sustained 5 req/s, bursts up to 10.
const bucket = new TokenBucket(10, 5);
console.log(bucket.tryRemove()); // true That works perfectly — in one process. The moment you run three Node instances behind a load balancer, each has its own bucket, and a client spread across all three gets 3× the intended limit. State that’s local to a process can’t enforce a global limit.
Going distributed with Redis
The fix is to keep the counter in a shared store every instance can see — almost always Redis, because it’s fast and atomic. The naive fixed-window version is two commands:
import Redis from 'ioredis';
const redis = new Redis();
async function allow(userId, limit = 100, windowSec = 60) {
const key = `rl:${userId}:${Math.floor(Date.now() / 1000 / windowSec)}`;
const count = await redis.incr(key); // atomic increment
if (count === 1) await redis.expire(key, windowSec); // set TTL on first hit
return count <= limit;
} There’s a subtle bug lurking: if the process dies between INCR and EXPIRE,
the key never expires and that user is counted forever. The robust fix is to make
the whole check-and-increment a single atomic operation with a Lua script,
which Redis runs without interleaving anything else:
Telling the client what happened
A good limiter doesn’t just say no — it tells the caller how to behave. Return
429 Too Many Requests with standard headers so well-behaved clients can back
off instead of hammering:
RateLimit-Limit— the ceilingRateLimit-Remaining— tokens leftRetry-After— seconds until they should try again
That turns a blunt rejection into a contract clients can cooperate with — which is exactly what the next layer formalizes. Where does the limiter actually live? Usually at the API gateway, the front door we look at next.