When the Producer Is Faster Than the Consumer
Backpressure & Node Streams
How backpressure keeps a fast producer from drowning a slow consumer, why unbounded buffering crashes a Node process, and how streams, pipeline, and async iterators give you flow control for free.
What you'll learn
- Explain backpressure and why unbounded buffering crashes a process
- Use highWaterMark and stream.pipeline for correct flow control
- Consume streams safely with async iterators
Backpressure is what a slow consumer uses to tell a fast producer to slow down. It sounds like a niche stream detail, but it’s one of the most important ideas in all of systems engineering: any time data flows from a fast source to a slower sink, something has to absorb the difference — and if nothing pushes back, that something is an ever-growing buffer that eventually eats all your memory.
This is the flagship Node lesson of the section, because Node’s stream module
is built around backpressure. Get it right and you can pipe a 50GB file through
a transform on a 512MB container. Get it wrong and you OOM-crash on a file that
would’ve fit on a floppy disk.
The core problem: producer faster than consumer
Imagine reading from a fast SSD (millions of bytes/sec) and writing to a slow network socket (a fraction of that). The read side can produce data far faster than the write side can drain it. Where does the excess go?
If the producer keeps shoving data into the buffer regardless of how full it is, the buffer grows without limit. In a language with backpressure, the consumer gets to say “stop, I’m full” and the producer pauses until there’s room. That signal is backpressure.
The naive version that crashes
The trap looks innocent. You read a file and write it somewhere, ignoring the
return value of write:
import { createReadStream } from 'node:fs';
// ❌ readable emits 'data' as fast as it can read. We write each chunk,
// but ignore whether the destination is keeping up. If the source is
// faster than the sink, chunks pile into the write buffer — unbounded.
function copy(readable, writable) {
readable.on('data', (chunk) => {
writable.write(chunk); // returns false when buffer is full — we ignore it!
});
readable.on('end', () => writable.end());
}
// On a big file with a slow sink, memory climbs until the process is killed. The bug is that writable.write(chunk) returns false when its internal buffer
is past the highWaterMark — the threshold that says “I’m full.” Ignoring that
boolean means you keep writing into an already-full buffer, and Node has to queue
the overflow in memory.
highWaterMark: the threshold that signals “full”
Every Node stream has a highWaterMark: the amount of buffered data (bytes for
binary, objects for object-mode) at which it considers itself full. When a
writable’s buffer crosses it, write() returns false, and you’re expected to
stop writing until the stream emits a 'drain' event. Honoring that handoff
is backpressure done by hand:
function copy(readable, writable) {
readable.on('data', (chunk) => {
const ok = writable.write(chunk);
if (!ok) {
readable.pause(); // sink is full → stop reading
writable.once('drain', () => readable.resume()); // resume when it empties
}
});
readable.on('end', () => writable.end());
} That works — but it’s fiddly, and it’s easy to forget the error handling. Which is exactly why you almost never write it by hand.
stream.pipeline: backpressure and cleanup, done right
stream.pipeline wires streams together and handles backpressure, error
propagation, and resource cleanup for you. It’s the correct default for
connecting streams:
import { pipeline } from 'node:stream/promises';
import { createReadStream, createWriteStream } from 'node:fs';
import { createGzip } from 'node:zlib';
// Read → gzip → write, with backpressure across all three stages, plus
// automatic teardown if any stage errors. Memory stays bounded regardless
// of file size, because each stage pulls only as fast as the next accepts.
await pipeline(
createReadStream('huge.log'),
createGzip(),
createWriteStream('huge.log.gz'),
); Compare this to the crashing version: same logical task, but pipeline
propagates the slow writer’s “I’m full” signal all the way back to the reader, so
the reader only pulls as fast as the slowest stage drains. The buffer never grows
unbounded. Never use the old .pipe() without error handling, and never wire
streams by hand when pipeline will do it.
Async iterators: backpressure you can read with a for-loop
Modern Node lets you consume a readable stream with for await...of, and it
preserves backpressure automatically — the loop body acts as the consumer, and
the stream won’t read ahead faster than your loop processes:
import { createReadStream } from 'node:fs';
import { createInterface } from 'node:readline';
async function processHugeFile(path) {
const lines = createInterface({
input: createReadStream(path),
crlfDelay: Infinity,
});
// The await inside the loop IS the backpressure: the stream pauses
// while we work on each line, so memory never balloons — even on a
// file far larger than RAM.
for await (const line of lines) {
await indexLine(line); // slow per-line work; stream waits for us
}
} The await inside the loop is the whole trick: while you’re awaiting
indexLine, the stream is paused, so it can’t outrun you. You get bounded memory
with code that reads like a plain loop.
You now know how to keep data flowing safely under load. The next question is how to see what’s happening when it doesn’t — observability.