Reservoir Sampling

Selects k items uniformly at random from a stream of unknown (or very large) size using O(k) memory, without storing the entire stream.

Syntax

reservoirSample(stream, k)

Parameters

NameTypeRequiredDescription
stream Iterable<any> Yes Any iterable (array, generator, file lines) of unknown length. Each item is seen exactly once.
k number Yes Number of items to select. The reservoir holds exactly k items.

Returns

any[] — An array of k items chosen uniformly at random from the stream.

Examples

function reservoirSample(stream, k) {
  const reservoir = [];
  let i = 0;
  for (const item of stream) {
    if (i < k) {
      reservoir.push(item); // fill the reservoir first
    } else {
      const j = Math.floor(Math.random() * (i + 1));
      if (j < k) reservoir[j] = item; // replace with probability k/(i+1)
    }
    i++;
  }
  return reservoir;
}

// Select 3 from 10 items — each item has equal 3/10 probability
const stream = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
const sample = reservoirSample(stream, 3);
console.log(sample.length);    // always 3
console.log(new Set(sample).size); // all distinct
Output
3
3
// Algorithm R: select 1 random item from a stream (k=1)
function randomPick(stream) {
  let chosen = null, i = 0;
  for (const item of stream) {
    if (Math.random() * ++i < 1) chosen = item;
  }
  return chosen;
}

// Verify roughly uniform — count how often each value is picked
const counts = {};
for (let t = 0; t < 10000; t++) {
  const v = randomPick([1, 2, 3, 4, 5]);
  counts[v] = (counts[v] ?? 0) + 1;
}
const vals = Object.values(counts).map(c => Math.round(c / 100));
console.log(vals.every(v => v >= 18 && v <= 22)); // roughly 20% each
Output
true

Notes

| Case | Time | Space | |---------|------|-------| | Best | O(n) | O(k) | | Average | O(n) | O(k) | | Worst | O(n) | O(k) | **Correctness proof sketch**: after seeing i items, each has probability k/i of being in the reservoir (by induction). When item i+1 arrives, it is placed with probability k/(i+1). Each existing item is evicted with probability (k/(i+1)) · (1/k) = 1/(i+1), leaving it in the reservoir with probability 1 - 1/(i+1) = i/(i+1). Combined: k/i · i/(i+1) = k/(i+1). The **L algorithm** and **Z algorithm** (not the string Z-algorithm) are faster variants that skip ahead using a gap distribution, achieving O(k · (1 + log(n/k))) time.

See also