Process Gigabyte Files Line-by-Line Without Loading Them into Memory
Streams for Large Data Processing
Node.js Readable and Transform streams let you pipe data through an algorithm one chunk at a time, keeping memory flat even on multi-gigabyte inputs — including backpressure to avoid overwhelming downstream consumers.
What you'll learn
- Process a large file line-by-line using readline and a Readable stream
- Build a Transform stream that filters or maps chunks on the fly
- Explain backpressure and how pipe handles it automatically
Loading a 4 GB log file with fs.readFileSync allocates 4 GB of heap — a
quick way to crash a Node process. Streams let you handle the same file with
a fixed memory footprint, because you never hold more than one chunk at a time.
Reading a File Line-by-Line
readline.createInterface wraps any Readable and emits one line event per
newline. It is the idiomatic way to process CSVs, logs, or NDJSON files.
import { createReadStream } from 'node:fs';
import { createInterface } from 'node:readline';
async function countWords(filePath) {
const stream = createReadStream(filePath, { encoding: 'utf8' });
const rl = createInterface({ input: stream, crlfDelay: Infinity });
let wordCount = 0;
for await (const line of rl) {
wordCount += line.split(/\s+/).filter(Boolean).length;
}
return wordCount;
}
const total = await countWords('./war-and-peace.txt');
console.log('Total words:', total); The for await...of loop on the readline interface keeps backpressure intact —
it only reads the next line when your code is ready for it.
Transform Streams for On-the-Fly Mapping
A Transform stream sits in the middle of a pipeline, receiving chunks and
emitting transformed ones. This is useful when you want to filter rows, parse
JSON, or apply an algorithm to each record without buffering everything.
import { createReadStream, createWriteStream } from 'node:fs';
import { Transform } from 'node:stream';
import { pipeline } from 'node:stream/promises';
// Keep only lines where the third CSV column exceeds a threshold
class FilterHighValue extends Transform {
constructor(threshold) {
super({ readableObjectMode: false, writableObjectMode: false });
this._threshold = threshold;
this._partial = '';
}
_transform(chunk, _encoding, callback) {
const lines = (this._partial + chunk.toString()).split('\n');
this._partial = lines.pop(); // save incomplete trailing line
for (const line of lines) {
const cols = line.split(',');
if (Number(cols[2]) > this._threshold) {
this.push(line + '\n');
}
}
callback();
}
_flush(callback) {
if (this._partial) this.push(this._partial + '\n');
callback();
}
}
await pipeline(
createReadStream('./data.csv'),
new FilterHighValue(1000),
createWriteStream('./filtered.csv'),
); Backpressure
Backpressure is the mechanism that prevents a fast producer from overwhelming a
slow consumer. When the writable side’s internal buffer is full, write()
returns false — the readable should pause until the drain event fires.
stream.pipeline (and its promise variant from node:stream/promises) wires
backpressure automatically. Avoid manually piping and forgetting to handle
drain — that is the most common stream memory leak.
| API | Backpressure | Error forwarding |
|---|---|---|
readable.pipe(writable) | Yes | Partial — must add error handlers |
stream.pipeline(...) | Yes | Yes — one callback for all errors |
stream/promises pipeline | Yes | Yes — rejects on any stream error |
Up Next
Once your algorithm runs without crashing the process, you need to know where it spends its time. Profiling gives you that picture.
Profiling Node.js Algorithms →