Reservoir Sampling
Selects k items uniformly at random from a stream of unknown (or very large) size using O(k) memory, without storing the entire stream.
Syntax
reservoirSample(stream, k) Parameters
| Name | Type | Required | Description |
|---|---|---|---|
stream | Iterable<any> | Yes | Any iterable (array, generator, file lines) of unknown length. Each item is seen exactly once. |
k | number | Yes | Number of items to select. The reservoir holds exactly k items. |
Returns
any[] — An array of k items chosen uniformly at random from the stream.
Examples
function reservoirSample(stream, k) {
const reservoir = [];
let i = 0;
for (const item of stream) {
if (i < k) {
reservoir.push(item); // fill the reservoir first
} else {
const j = Math.floor(Math.random() * (i + 1));
if (j < k) reservoir[j] = item; // replace with probability k/(i+1)
}
i++;
}
return reservoir;
}
// Select 3 from 10 items — each item has equal 3/10 probability
const stream = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
const sample = reservoirSample(stream, 3);
console.log(sample.length); // always 3
console.log(new Set(sample).size); // all distinct
Output
3
3
// Algorithm R: select 1 random item from a stream (k=1)
function randomPick(stream) {
let chosen = null, i = 0;
for (const item of stream) {
if (Math.random() * ++i < 1) chosen = item;
}
return chosen;
}
// Verify roughly uniform — count how often each value is picked
const counts = {};
for (let t = 0; t < 10000; t++) {
const v = randomPick([1, 2, 3, 4, 5]);
counts[v] = (counts[v] ?? 0) + 1;
}
const vals = Object.values(counts).map(c => Math.round(c / 100));
console.log(vals.every(v => v >= 18 && v <= 22)); // roughly 20% each
Output
true
Notes
| Case | Time | Space |
|---------|------|-------|
| Best | O(n) | O(k) |
| Average | O(n) | O(k) |
| Worst | O(n) | O(k) |
**Correctness proof sketch**: after seeing i items, each has probability
k/i of being in the reservoir (by induction). When item i+1 arrives,
it is placed with probability k/(i+1). Each existing item is evicted
with probability (k/(i+1)) · (1/k) = 1/(i+1), leaving it in the
reservoir with probability 1 - 1/(i+1) = i/(i+1). Combined:
k/i · i/(i+1) = k/(i+1). The **L algorithm** and **Z algorithm**
(not the string Z-algorithm) are faster variants that skip ahead
using a gap distribution, achieving O(k · (1 + log(n/k))) time.