Boyer-Moore Algorithm
π‘ Concept
Boyer-Moore Algorithm is a landmark string searching technique that minimizes character comparisons by βskipping aheadβ intelligently. Instead of checking every character, it leverages the structure of the pattern itself to jump across the text efficiently.
π Quick Intro
Boyer-Moore was built for speed in long texts. By preprocessing the pattern, it knows where itβs safe to skip, often leaping across large portions of text in one step. In practice, itβs one of the fastest algorithms for real-world string searching.
π§ Analogy
Imagine looking for a red book on a shelf. Instead of scanning every single spine, you jump ahead to spots where you see flashes of red. Boyer-Moore does the same: it skips unpromising sections and only checks where a match is realistically possible.
π§ Technical Explanation
- π© Bad Character Rule: On mismatch, shift the pattern so the mismatched text character lines up with its last occurrence in the pattern (or skip it if itβs absent).
- π Good Suffix Rule: If part of the pattern matched, reuse that knowledge to jump further ahead.
- β© Right-to-Left Scan: Checks characters starting from the end of the pattern, spotting mismatches faster.
- β‘ Performance: Worst-case is O(mn), but in practice it often runs much faster than KMP or naive search, especially with longer patterns.
π― Purpose & Use Case
- β
Blazing-fast text search in editors, IDEs, and command-line tools (
grep
). - β DNA sequence analysis in bioinformatics.
- β Search engines and compilers for keyword lookups.
- β Detecting malware signatures or plagiarism in large datasets.
π» Real Code Example
// Boyer-Moore search (using Bad Character heuristic)
int BoyerMooreSearch(string text, string pattern)
{
int[] last = new int[256];
for (int i = 0; i < last.Length; i++) last[i] = -1;
for (int i = 0; i < pattern.Length; i++)
last[(int)pattern[i]] = i;
int n = text.Length, m = pattern.Length, s = 0;
while (s <= n - m)
{
int j = m - 1;
while (j >= 0 && pattern[j] == text[s + j]) j--;
if (j < 0)
return s; // Found!
else
s += Math.Max(1, j - last[(int)text[s + j]]);
}
return -1;
}
// Example usage:
Console.WriteLine(BoyerMooreSearch("EXAMPLE IN A SAMPLE TEXT", "SAMPLE")); // Output: 12

β Interview Q&A
Q1: What is the main benefit of Boyer-Moore over naive search?
A: It skips over sections of text, reducing unnecessary comparisons.
Q2: Which two key heuristics power Boyer-Moore?
A: The Bad Character Rule and Good Suffix Rule.
Q3: How does Boyer-Moore decide how far to jump?
A: Based on the mismatched characterβs position in the pattern or matched suffixes.
Q4: In which direction does it check characters?
A: From right to left (end of the pattern backwards).
Q5: When does Boyer-Moore perform best?
A: On long patterns and large texts, especially with few pattern repetitions.
Q6: What is the Bad Character Rule?
A: It shifts the pattern so the bad character aligns with the rightmost occurrence in the pattern.
Q7: What does the Good Suffix Rule do?
A: It shifts the pattern based on matching suffixes after a mismatch.
Q8: How does Boyer-Moore preprocess the pattern?
A: By building tables for bad characters and good suffixes.
Q9: Is Boyer-Moore efficient for short patterns?
A: It's most efficient for longer patterns.
Q10: What is a real-world application of Boyer-Moore?
A: Plagiarism detection and text editors.
π MCQs
Q1. What is the Boyer-Moore algorithm designed for?
- Sorting
- Fast string search
- Compression
- Data hashing
Q2. Which rule lets Boyer-Moore skip text on mismatch?
- Quick Exit
- Suffix Check
- Bad Character Rule
- Heap Skip
Q3. Boyer-Moore scans the pattern in which direction?
- Left to right
- Random
- Middle out
- Right to left
Q4. Why does Boyer-Moore often beat KMP in practice?
- It sorts first
- Skips ahead using more pattern info
- It’s O(1)
- It reverses the pattern
Q5. Which area benefits from Boyer-Moore?
- Number sorting
- Plagiarism detection
- Array merging
- Stack overflow
Q6. What tables does Boyer-Moore preprocess?
- Hash table
- Bad character and good suffix tables
- Frequency table
- Priority queue
Q7. What happens when a mismatch occurs?
- Restart search
- Pattern shifts based on rules
- Search stops
- Swap characters
Q8. Which rule applies when a mismatch occurs at the end of the pattern?
- Bad Character Rule
- Good Suffix Rule
- Prefix Rule
- Suffix Rule
Q9. What is the worst-case time complexity of Boyer-Moore?
- O(n^2)
- O(n + m)
- O(m log n)
- O(n)
Q10. Which search algorithm is generally faster in practice, Boyer-Moore or naive?
- Naive
- Boyer-Moore
- KMP
- Rabin-Karp
π‘ Bonus Insight
Boyer-Mooreβs real genius lies in skipping, not scanning.
Thatβs why tools like grep
and IDEs prefer itβit can cut through gigabytes of text without slowing down.
π PDF Download
Need a handy summary for your notes? Download this topic as a PDF!