String Searching

Understand Why Naive Pattern Search Is O(nm) and What the Alternatives Offer

String Searching

String searching underpins indexOf, includes, and regex — knowing the naive algorithm's worst case and why KMP and Rabin-Karp improve on it is essential context for choosing the right tool.

5 min read Level 2/5 #dsa#strings#searching
What you'll learn
  • Implement and analyse the naive O(nm) pattern-matching algorithm
  • Explain the key insight behind KMP and why it avoids redundant comparisons
  • Use indexOf and includes with awareness of their underlying linear scan

Searching for a pattern string p inside a text string t is one of the most common string operations. JavaScript’s built-in indexOf and includes hide the implementation, but understanding what they must do — and what cleverer algorithms can avoid — makes you a better user of both.

Naive Algorithm

The simplest approach: try to match p at every position in t.

function naiveSearch(text, pattern) {
  const n = text.length;
  const m = pattern.length;
  const matches = [];

  for (let i = 0; i <= n - m; i++) {
    let j = 0;
    while (j < m && text[i + j] === pattern[j]) {
      j++;
    }
    if (j === m) matches.push(i); // full match at position i
  }
  return matches;
}

// Walk-through: text="aababab", pattern="abab"
// i=0: text[0..3]="aaba" → mismatch at j=1
// i=1: text[1..4]="abab" → full match → push 1
// i=2: text[2..5]="abab" → full match → push 2  (overlapping allowed)
// i=3: text[3..6]="abab" → full match → push 3
console.log(naiveSearch("aababab", "abab")); // [1, 2, 3]

Complexity Analysis

CaseTimeExample
Best (first char never matches)O(n)text=“aaa…a”, pattern=“b…”
Average (random text)O(n + m) in practiceEnglish prose
Worst (many near-matches)O(nm)text=“aaa…a”, pattern=“aaa…ab”

The worst case arises when many partial matches occur. V8’s indexOf uses a variant of Boyer-Moore-Horspool or a similar sub-linear algorithm for long patterns, but the contract remains that it cannot be better than O(n) overall.

KMP — The Key Insight

Knuth-Morris-Pratt avoids rescanning characters that have already been matched. It precomputes a “failure function” (also called the partial-match table) for the pattern in O(m). During the search, when a mismatch occurs at position j in the pattern, the failure function says how far back in the pattern to resume — potentially skipping many characters in the text without re-examining them.

Total complexity: O(n + m) time, O(m) space.

Rabin-Karp — Rolling Hash

Rabin-Karp computes a hash of each length-m window of the text and compares it to the hash of the pattern. A rolling hash updates in O(1) by subtracting the outgoing character’s contribution and adding the incoming one.

Average case: O(n + m). Worst case: O(nm) if every hash matches but the string comparison fails (hash collision). In practice, choosing a good hash function makes collisions negligibly rare.

Z-Algorithm

The Z-array for a string s stores, for each position i, the length of the longest substring starting at i that is also a prefix of s. Appending the pattern to the text with a separator and computing the Z-array solves pattern matching in O(n + m)/O(n + m).

Built-In Methods

const text = "the cat sat on the mat";

// indexOf returns first match position or -1
console.log(text.indexOf("cat"));       // 4
console.log(text.indexOf("dog"));       // -1

// includes returns boolean — slightly more readable
console.log(text.includes("sat"));      // true

// lastIndexOf scans from the end
console.log(text.lastIndexOf("the"));   // 15

All three methods perform a linear scan under the hood — they are O(n) per call.

Up Next

Before pattern matching can work correctly on all text, you need to understand how Unicode surrogate pairs can break naive character-by-character comparisons.

Unicode and Surrogate Pairs →