Huffman Encoding

Build the Minimum-Cost Prefix Tree by Always Merging the Two Rarest Symbols

Huffman Encoding

Huffman encoding uses a greedy min-heap algorithm to build an optimal variable-length prefix code, guaranteeing no codeword is a prefix of another and minimizing expected bits per symbol.

6 min read Level 3/5 #dsa#greedy#huffman
What you'll learn
  • Build a Huffman tree from a frequency table using a min-heap
  • Explain why the two lowest-frequency nodes are always merged first
  • Describe the prefix-code property and why it enables unambiguous decoding

Huffman encoding assigns shorter binary codewords to frequent symbols and longer ones to rare symbols, minimizing the total number of bits needed to encode a message. The algorithm is greedy: at every step, merge the two nodes with the lowest frequency into a new internal node whose frequency is their sum.

The Prefix-Code Property

A prefix code is a set of codewords where no codeword is a prefix of another. This lets a decoder consume bits left-to-right and know exactly when a symbol boundary occurs — no look-ahead needed. A binary tree whose leaves are the symbols naturally produces a prefix code: the path from root to each leaf gives that symbol’s codeword.

Building the Tree with a Min-Heap

class MinHeap {
  constructor() { this.h = []; }
  push(node) {
    this.h.push(node);
    this._bubbleUp(this.h.length - 1);
  }
  pop() {
    const top = this.h[0];
    const last = this.h.pop();
    if (this.h.length) { this.h[0] = last; this._sinkDown(0); }
    return top;
  }
  get size() { return this.h.length; }
  _bubbleUp(i) {
    while (i > 0) {
      const p = (i - 1) >> 1;
      if (this.h[p].freq <= this.h[i].freq) break;
      [this.h[p], this.h[i]] = [this.h[i], this.h[p]];
      i = p;
    }
  }
  _sinkDown(i) {
    const n = this.h.length;
    while (true) {
      let min = i;
      const l = 2 * i + 1, r = 2 * i + 2;
      if (l < n && this.h[l].freq < this.h[min].freq) min = l;
      if (r < n && this.h[r].freq < this.h[min].freq) min = r;
      if (min === i) break;
      [this.h[min], this.h[i]] = [this.h[i], this.h[min]];
      i = min;
    }
  }
}

function buildHuffman(freqMap) {
  const heap = new MinHeap();
  for (const [sym, freq] of Object.entries(freqMap)) {
    heap.push({ sym, freq, left: null, right: null });
  }

  while (heap.size > 1) {
    const a = heap.pop();
    const b = heap.pop();
    heap.push({ sym: null, freq: a.freq + b.freq, left: a, right: b });
  }

  return heap.pop(); // root of the Huffman tree
}

Generating the Codewords

function getCodes(node, prefix = '', codes = {}) {
  if (node.sym !== null) {
    codes[node.sym] = prefix || '0'; // edge case: single symbol
    return codes;
  }
  getCodes(node.left,  prefix + '0', codes);
  getCodes(node.right, prefix + '1', codes);
  return codes;
}

const freqs = { a: 45, b: 13, c: 12, d: 16, e: 9, f: 5 };
const tree  = buildHuffman(freqs);
const codes = getCodes(tree);
console.log(codes);
// { a: '0', c: '100', b: '101', f: '1100', e: '1101', d: '111' }

Why Merge the Two Rarest First?

Exchange argument. In any optimal prefix tree, the two rarest symbols are siblings at the deepest level. If they were not, you could swap a deeper symbol with a rarer one higher in the tree and strictly decrease the weighted path length — a contradiction. Therefore always merging the two smallest frequencies is safe and the resulting tree is optimal.

SymbolFrequencyCodewordBits contributed
a45045
d1611148
b1310139
c1210036
e9110136
f5110020

Total: 224 bits for 100 symbols — versus 300 bits for a fixed 3-bit code.

Up Next

Before bit manipulation can shrink data further, you need to understand the bitwise operators JavaScript provides and its int32 coercion rules.

Bitwise Operators and int32 Coercion →