Streams for Large Data Processing

Node.js Readable and Transform streams let you pipe data through an algorithm one chunk at a time, keeping memory flat even on multi-gigabyte inputs — including backpressure to avoid overwhelming downstream consumers.

5 min read Level 3/5 #dsa#nodejs#interview

What you'll learn

Process a large file line-by-line using readline and a Readable stream
Build a Transform stream that filters or maps chunks on the fly
Explain backpressure and how pipe handles it automatically

Loading a 4 GB log file with fs.readFileSync allocates 4 GB of heap — a quick way to crash a Node process. Streams let you handle the same file with a fixed memory footprint, because you never hold more than one chunk at a time.

Reading a File Line-by-Line

readline.createInterface wraps any Readable and emits one line event per newline. It is the idiomatic way to process CSVs, logs, or NDJSON files.

import { createReadStream } from 'node:fs';
import { createInterface } from 'node:readline';

async function countWords(filePath) {
  const stream = createReadStream(filePath, { encoding: 'utf8' });
  const rl = createInterface({ input: stream, crlfDelay: Infinity });

  let wordCount = 0;
  for await (const line of rl) {
    wordCount += line.split(/\s+/).filter(Boolean).length;
  }
  return wordCount;
}

const total = await countWords('./war-and-peace.txt');
console.log('Total words:', total);

The for await...of loop on the readline interface keeps backpressure intact — it only reads the next line when your code is ready for it.

Transform Streams for On-the-Fly Mapping

A Transform stream sits in the middle of a pipeline, receiving chunks and emitting transformed ones. This is useful when you want to filter rows, parse JSON, or apply an algorithm to each record without buffering everything.

import { createReadStream, createWriteStream } from 'node:fs';
import { Transform } from 'node:stream';
import { pipeline } from 'node:stream/promises';

// Keep only lines where the third CSV column exceeds a threshold
class FilterHighValue extends Transform {
  constructor(threshold) {
    super({ readableObjectMode: false, writableObjectMode: false });
    this._threshold = threshold;
    this._partial = '';
  }

  _transform(chunk, _encoding, callback) {
    const lines = (this._partial + chunk.toString()).split('\n');
    this._partial = lines.pop(); // save incomplete trailing line
    for (const line of lines) {
      const cols = line.split(',');
      if (Number(cols[2]) > this._threshold) {
        this.push(line + '\n');
      }
    }
    callback();
  }

  _flush(callback) {
    if (this._partial) this.push(this._partial + '\n');
    callback();
  }
}

await pipeline(
  createReadStream('./data.csv'),
  new FilterHighValue(1000),
  createWriteStream('./filtered.csv'),
);

Backpressure

Backpressure is the mechanism that prevents a fast producer from overwhelming a slow consumer. When the writable side’s internal buffer is full, write() returns false — the readable should pause until the drain event fires.

stream.pipeline (and its promise variant from node:stream/promises) wires backpressure automatically. Avoid manually piping and forgetting to handle drain — that is the most common stream memory leak.

API	Backpressure	Error forwarding
`readable.pipe(writable)`	Yes	Partial — must add error handlers
`stream.pipeline(...)`	Yes	Yes — one callback for all errors
`stream/promises pipeline`	Yes	Yes — rejects on any stream error

Up Next

Once your algorithm runs without crashing the process, you need to know where it spends its time. Profiling gives you that picture.

Profiling Node.js Algorithms →