Latency Numbers Every Engineer Should Know

The orders-of-magnitude latency table — cache vs RAM vs SSD vs network vs cross-region — and why blocking the Node event loop is the cardinal sin.

8 min read Level 3/5 #system-design#latency#performance

What you'll learn

Recall the relative cost of memory, disk, and network access
Reason about latency budgets across a request path
Connect latency to the Node.js event loop

To design for latency you need a feel for how long things take relative to each other. The absolute nanoseconds drift with each hardware generation, but the ratios barely change — and the ratios are what matter. Reading from RAM is ~100× slower than L1 cache; a disk seek is ~3,000,000× slower; a cross-Atlantic round trip is ~150,000,000× slower. Those gaps decide your architecture.

The table (orders of magnitude)

Operation	Approx latency	Relative to L1
L1 cache reference	~1 ns	1×
Branch mispredict	~3 ns	3×
L2 cache reference	~4 ns	4×
Mutex lock/unlock	~17 ns	17×
Main memory (RAM) reference	~100 ns	100×
Compress 1KB	~2,000 ns (2 µs)	2,000×
Read 1MB sequentially from RAM	~3 µs	3,000×
SSD random read	~16 µs	16,000×
Read 1MB sequentially from SSD	~50 µs	50,000×
Round trip within a datacenter	~500 µs	500,000×
Read 1MB from disk (HDD)	~2 ms	2,000,000×
Disk (HDD) seek	~3 ms	3,000,000×
Round trip CA ↔ Netherlands	~150 ms	150,000,000×

A few takeaways that recur all over system design:

Memory is ~30,000× faster than a disk seek. This is why caches exist — serving from RAM instead of disk is the single biggest latency win available.
A datacenter round trip (~0.5ms) is cheap; a cross-region round trip (~150ms) is not. Chatty cross-region calls are death by a thousand cuts.
Sequential beats random by orders of magnitude on both SSD and disk. This is why log-structured storage and append-only designs perform well.

Latency budgets

If your p99 target is 200ms, you have a budget to spend across the request path. Sketch where it goes:

Latency Numbers Every Engineer Should Know — architecture diagram

A cache hit costs ~0.5ms; a database query costs ~5–10ms; a cross-region hop costs ~150ms all by itself. If one external call in your path crosses an ocean, it can blow the entire 200ms budget on its own — which is the argument for regional replicas and edge caching.

The JavaScript angle: the event loop is single-threaded

For Node engineers, latency has a specific and dangerous flavor: Node runs your JavaScript on one thread. A slow synchronous operation doesn’t just slow that request — it blocks every concurrent request behind it, because nothing else can run until the event loop is free again.

The same work — one of these blocks everyone script.js

import { readFile, readFileSync } from 'node:fs';

// ❌ Synchronous: blocks the event loop. Every other in-flight
//    request waits the full read time. Latency spikes for all of them.
app.get('/bad', (req, res) => {
  const data = readFileSync('./big.json');   // event loop frozen here
  res.json(JSON.parse(data));
});

// ✅ Asynchronous: the read happens off-thread; the event loop keeps
//    serving other requests while it waits. No collateral latency.
app.get('/good', (req, res) => {
  readFile('./big.json', (err, data) => {
    res.json(JSON.parse(data));
  });
});

▶ Preview: console

The lesson generalizes: a 50ms CPU-bound JSON parse, a tight loop over a huge array, synchronous crypto — any of these holds the loop and turns one slow request into a fleet-wide latency event. When CPU work is unavoidable, move it off the main thread (worker_threads) or out of the request path (a queue) — both techniques we cover later in the track.

With scale (last lesson), latency (this one), and the framework in hand, the final foundation is measuring whether the system is actually up: availability.