## KMP String Searching Algorithm

**String strStr Knuth-Morris-Pratt algorithm**

The Knuth-Morris-Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a “word” S within a main “text string” T by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. *(Paragraph cited from https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm)*

The most straightforward strStr() algorithm is to look for a match of S at every index in text T, which takes O(TS) time in the worst case, while the KMP has a better worst case performance of O(T + S) = O(T).

The difference is that KMP makes use of previous match information from the jump table of string S that the straightforward algorithm does not.

# Example of KMP searching

Above example explains how the jumps take place during a search. But how do we learn the number of steps to jump at each round?

The occurrences of S's prefix in T is the key to the matter.

# Building a Jump Table for S

For S = "ABCDABCA"

The jump table is a integer array that indicates the length of the matching prefix and suffix in S.

**I.**If S and T matches till i = 0, the part matched is "A". Then look for the occurrence of S's prefix in "" (Cut the first character in the matched part because we don't need to compare S[0] = 'A' and T[x + 0] = 'A' again). No prefix of S exists in "". Put 0 to the jump table for i = 0. S = "ABCDABCA" J = 0

**II.**If S and T matches till i = 1, the part matched is "AB". Then look for the occurrence of S's prefix in the suffix of "B". No prefix of S exists in "B". Put 0 to the jump table for i = 1. S = "ABCDABCA" J = 00

**III.**If S and T matches till i = 2, the part matched is "ABC". Then look for the occurrence of S's prefix in the suffix of "BC". No prefix of S exists in "BC". Put 0 to the jump table for i = 2. S = "ABCDABCA" J = 0 0 0

**IV.**If S and T matches till i = 3, the part matched is "ABCD". Then look for the occurrence of S's prefix in the suffix of "BCD". No prefix of S exists in "BCD". Put 0 to the jump table for i = 3. S = "ABCDABCA" J = 00 00

**V.**If S and T matches till i = 4, the part matched is "ABCDA". Then look for the occurrence of S's prefix in the suffix of "BCDA". The suffix "A" in "BCDA" is a prefix of S. As the matching prefix and suffix has a length of 1, put 1 to the jump table for i = 4. S = "ABCDABCA" J = 00001

**VI.**If S and T matches till i = 5, the part matched is "ABCDAB". Then look for the occurrence of S's prefix in the suffix of "BCDAB". The suffix "AB" in "BCDAB" is a prefix of S. As the matching prefix and suffix has a length of 2, put 2 to the jump table for i = 5. S = "ABCDABCA" J = 00 0 01 2

**IIV.**If S and T matches till i = 6, the part matched is "ABCDABC". Then look for the occurrence of S's prefix in the suffix of "BCDABC". The suffix "ABC" in "BCDABC" is a prefix of S. As the matching prefix and suffix has a length of 3, put 3 to the jump table for i = 6. S = "ABCDABCA" J = 0 00 01 23

**VIII.**If S and T matches till i = 7, the part matched is "ABCDABCA". Then look for the occurrence of S's prefix in the suffix of "BCDABCA". The suffix "A" in "BCDABCA" is a prefix of S. As the matching prefix and suffix has a length of 1, put 1 to the jump table for i = 7. S = "ABCDABCA" J = 0 0 0012 3 1

The jump table for string S is

[0, 0, 0, 0, 1, 2, 3, 1]

# Implementation of Jump Table

**Base case**At i = 0, set J[0] = 0.

**General case**At i >= 1, let idx = i, LOOP: Check for the characters S[J[i - 1] ] == S[idx] if true, J[i] = J[i - 1] + 1 (size of the matching prefix and suffix increased by 1) while false, let i = J[i - 1], check for S[J[i - 1]] == S[idx] (Repeat LOOP until i == 0, if S[0] != S[idx], then set J[idx] = 0, otherwise J[idx] = 1) The purpose of the loop cover the case when string S looks like "AABABXXXAABAA" 01 010 00012 34 idx i To find J[idx], we first check whether S[idx] == S[J[idx - 1]], which is checking whether the prefix "AABAB" matches the suffix "AABAA", but no. Does this mean that no prefix and suffix in S match any more? Obviously there could still be a match. So update i = S[J[i - 1]]. "AABABXXXAABAA" 01 010 00012 34 idx i Check for S[idx] == S[J[i - 1]], which checks for whether the prefix "AA" matches the suffix "AA". Sure it does. Therefore J[idx] = J[i - 1] + 1 = 2. "AABABXXXAABAA" 01 010 00012 34 2 i

```
int[] table(String s) {
int[] t = new int[s.length()];
int i = 1, len = 0;
while(i < s.length()) {
if(s.charAt(i) == s.charAt(len)) {
len++;
t[i] = len;
i++;
} else {
if(len == 0) {
i++; //t[i] = 0;
} else {
len = t[len - 1 ];
}
}
}
return t;
}
```

# How many steps to jump during a search?

## Implementation

```
//find needle in haystack, i iterates through haystack and j iterates through needle.
public void strStr(String haystack, String needle) {
int[] next = table(needle); //get jump table
int i = 0, j = 0;
while (i <= haystack.length() - needle.length()) {
if (haystack.charAt(i + j) == needle.charAt(j)) {
j++;
if (j == needle.length()) { //needle found
System.out.println(i); //print where needle is found
i = i + j - next[j - 1]; //jump
j = 0; //reset j and look for the next match
}
} else {
if(j == 0) i++; //first chars mismatch
else {
i = i + j - next[j - 1]; //jump
j = 0; //reset j and look for the next match
}
}
}
}
```

#### Get one-to-one training from Google Facebook engineers

##### Top-notch Professionals

Learn from Facebook and Google senior engineers interviewed 100+ candidates.

Most recent interview questions and system design topics gathered from aonecode alumnus.

One-to-one online classes. Get feedbacks from real interviewers.

##### Customized Private Class

Already a coding expert? - Advance straight to hard interview topics of your interest.

New to the ground? - Develop basic coding skills with your own designated mentor.

Days before interview? - Focus on most important problems in target company question bank.