Backpressure & Node Streams

When the Producer Is Faster Than the Consumer

Backpressure & Node Streams

How backpressure keeps a fast producer from drowning a slow consumer, why unbounded buffering crashes a Node process, and how streams, pipeline, and async iterators give you flow control for free.

9 min read Level 3/5 #system-design#backpressure#streams
What you'll learn
  • Explain backpressure and why unbounded buffering crashes a process
  • Use highWaterMark and stream.pipeline for correct flow control
  • Consume streams safely with async iterators

Backpressure is what a slow consumer uses to tell a fast producer to slow down. It sounds like a niche stream detail, but it’s one of the most important ideas in all of systems engineering: any time data flows from a fast source to a slower sink, something has to absorb the difference — and if nothing pushes back, that something is an ever-growing buffer that eventually eats all your memory.

This is the flagship Node lesson of the section, because Node’s stream module is built around backpressure. Get it right and you can pipe a 50GB file through a transform on a 512MB container. Get it wrong and you OOM-crash on a file that would’ve fit on a floppy disk.

The core problem: producer faster than consumer

Imagine reading from a fast SSD (millions of bytes/sec) and writing to a slow network socket (a fraction of that). The read side can produce data far faster than the write side can drain it. Where does the excess go?

Backpressure & Node Streams — architecture diagram

If the producer keeps shoving data into the buffer regardless of how full it is, the buffer grows without limit. In a language with backpressure, the consumer gets to say “stop, I’m full” and the producer pauses until there’s room. That signal is backpressure.

The naive version that crashes

The trap looks innocent. You read a file and write it somewhere, ignoring the return value of write:

The unbounded-buffer crash script.js
import { createReadStream } from 'node:fs';

// ❌ readable emits 'data' as fast as it can read. We write each chunk,
//    but ignore whether the destination is keeping up. If the source is
//    faster than the sink, chunks pile into the write buffer — unbounded.
function copy(readable, writable) {
  readable.on('data', (chunk) => {
    writable.write(chunk); // returns false when buffer is full — we ignore it!
  });
  readable.on('end', () => writable.end());
}
// On a big file with a slow sink, memory climbs until the process is killed.
▶ Preview: console

The bug is that writable.write(chunk) returns false when its internal buffer is past the highWaterMark — the threshold that says “I’m full.” Ignoring that boolean means you keep writing into an already-full buffer, and Node has to queue the overflow in memory.

highWaterMark: the threshold that signals “full”

Every Node stream has a highWaterMark: the amount of buffered data (bytes for binary, objects for object-mode) at which it considers itself full. When a writable’s buffer crosses it, write() returns false, and you’re expected to stop writing until the stream emits a 'drain' event. Honoring that handoff is backpressure done by hand:

Honoring backpressure manually script.js
function copy(readable, writable) {
  readable.on('data', (chunk) => {
    const ok = writable.write(chunk);
    if (!ok) {
      readable.pause();                         // sink is full → stop reading
      writable.once('drain', () => readable.resume()); // resume when it empties
    }
  });
  readable.on('end', () => writable.end());
}
▶ Preview: console

That works — but it’s fiddly, and it’s easy to forget the error handling. Which is exactly why you almost never write it by hand.

stream.pipeline: backpressure and cleanup, done right

stream.pipeline wires streams together and handles backpressure, error propagation, and resource cleanup for you. It’s the correct default for connecting streams:

pipeline handles flow control and errors script.js
import { pipeline } from 'node:stream/promises';
import { createReadStream, createWriteStream } from 'node:fs';
import { createGzip } from 'node:zlib';

// Read → gzip → write, with backpressure across all three stages, plus
// automatic teardown if any stage errors. Memory stays bounded regardless
// of file size, because each stage pulls only as fast as the next accepts.
await pipeline(
  createReadStream('huge.log'),
  createGzip(),
  createWriteStream('huge.log.gz'),
);
▶ Preview: console

Compare this to the crashing version: same logical task, but pipeline propagates the slow writer’s “I’m full” signal all the way back to the reader, so the reader only pulls as fast as the slowest stage drains. The buffer never grows unbounded. Never use the old .pipe() without error handling, and never wire streams by hand when pipeline will do it.

Async iterators: backpressure you can read with a for-loop

Modern Node lets you consume a readable stream with for await...of, and it preserves backpressure automatically — the loop body acts as the consumer, and the stream won’t read ahead faster than your loop processes:

Bounded processing with async iteration script.js
import { createReadStream } from 'node:fs';
import { createInterface } from 'node:readline';

async function processHugeFile(path) {
  const lines = createInterface({
    input: createReadStream(path),
    crlfDelay: Infinity,
  });

  // The await inside the loop IS the backpressure: the stream pauses
  // while we work on each line, so memory never balloons — even on a
  // file far larger than RAM.
  for await (const line of lines) {
    await indexLine(line); // slow per-line work; stream waits for us
  }
}
▶ Preview: console

The await inside the loop is the whole trick: while you’re awaiting indexLine, the stream is paused, so it can’t outrun you. You get bounded memory with code that reads like a plain loop.

You now know how to keep data flowing safely under load. The next question is how to see what’s happening when it doesn’t — observability.