This is the first post in my Codeforces blog! I plan to use this as a place where I can write down new algorithms I learned, problems that I found interesting, or stupid mistakes I made during contests.
Today, the topic will be the Z Algorithm, which I learned as a result of failing to solve Problem B of Beta Round 93 (http://codeforces.com/contest/126/problem/B). There are some other solutions like binary search + hashing, but this one is quite nice. Anyway, first, a description of the algorithm and why it works; it is simple and makes a lot of sense (as all good algorithms are).
Algorithm
Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest substring starting from S[i] which is also a prefix of S, i.e. the maximum k such that S[j] = S[i + j] for all 0 ≤ j < k. Note that Z[i] = 0 means that S[0] ≠ S[i]. For easier terminology, we will refer to substrings which are also a prefix as prefix-substrings.
The algorithm relies on a single, crucial invariant. As we iterate over the letters in the string (index i from 1 to n - 1), we maintain an interval [L, R] which is the interval with maximum R such that 1 ≤ L ≤ i ≤ R and S[L...R] is a prefix-substring (if no such interval exists, just let L = R = - 1). For i = 1, we can simply compute L and R by comparing S[0...] to S[1...]. Moreover, we also get Z[1] during this.
Now suppose we have the correct interval [L, R] for i - 1 and all of the Z values up to i - 1. We will compute Z[i] and the new [L, R] by the following steps:
- If i > R, then there does not exist a prefix-substring of S that starts before i and ends at or after i. If such a substring existed, [L, R] would have been the interval for that substring rather than its current value. Thus we "reset" and compute a new [L, R] by comparing S[0...] to S[i...] and get Z[i] at the same time (Z[i] = R - L + 1).
- Otherwise, i ≤ R, so the current [L, R] extends at least to i. Let k = i - L. We know that Z[i] ≥ min(Z[k], R - i + 1) because S[i...] matches S[k...] for at least R - i + 1 characters (they are in the [L, R] interval which we know to be a prefix-substring). Now we have a few more cases to consider.
- If Z[k] < R - i + 1, then there is no longer prefix-substring starting at S[i] (or else Z[k] would be larger), meaning Z[i] = Z[k] and [L, R] stays the same. The latter is true because [L, R] only changes if there is a prefix-substring starting at S[i] that extends beyond R, which we know is not the case here.
- If Z[k] ≥ R - i + 1, then it is possible for S[i...] to match S[0...] for more than R - i + 1 characters (i.e. past position R). Thus we need to update [L, R] by setting L = i and matching from S[R + 1] forward to obtain the new R. Again, we get Z[i] during this.
The process computes all of the Z values in a single pass over the string, so we're done. Correctness is inherent in the algorithm and is pretty intuitively clear.
Analysis
We claim that the algorithm runs in O(n) time, and the argument is straightforward. We never compare characters at positions less than R, and every time we match a character R increases by one, so there are at most n comparisons there. Lastly, we can only mismatch once for each i (it causes R to stop increasing), so that's another at most n comparisons, giving O(n) total.
Code
Simple and short. Note that the optimization L = R = i is used when S[0] ≠ S[i] (it doesn't affect the algorithm since at the next iteration i > R regardless).
int L = 0, R = 0;
for (int i = 1; i < n; i++) {
if (i > R) {
L = R = i;
while (R < n && s[R-L] == s[R]) R++;
z[i] = R-L; R--;
} else {
int k = i-L;
if (z[k] < R-i+1) z[i] = z[k];
else {
L = i;
while (R < n && s[R-L] == s[R]) R++;
z[i] = R-L; R--;
}
}
}Application
One application of the Z Algorithm is for the standard string matching problem of finding matches for a pattern T of length m in a string S of length n. We can do this in O(n + m) time by using the Z Algorithm on the string T Φ S (that is, concatenating T, Φ, and S) where Φ is a character that matches nothing. The indices i with Z[i] = m correspond to matches of T in S.
Lastly, to solve Problem B of Beta Round 93, we simply compute Z for the given string S, then iterate from i to n - 1. If Z[i] = n - i then we know the suffix from S[i] is a prefix, and if the largest Z value we've seen so far is at least n - i, then we know some string inside also matches that prefix. That gives the result.
int maxz = 0, res = 0;
for (int i = 1; i < n; i++) {
if (z[i] == n-i && maxz >= n-i) { res = n-i; break; }
maxz = max(maxz, z[i]);
}







That site is awesome, if someone could translate it to english, i think that it would be the best algorithm reference ever.
thanx for sharing this site with us :)
import java.io.*;
import java.util.*;
public class D {
public static void main(String[] args) throws IOException{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String s = in.readLine();
char[] c = s.toCharArray();
int n = c.length;
int[] memo = new int[n];
/* memo[i] will store the length of the longest prefix of s that matches the tail of s1...si */
for (int i=1; i<n; i++) {
int j=i;
while (j>0 && c[i] != c[memo[j-1]]) j = memo[j-1];
memo[i] = (j>0) ? memo[j-1]+1 : 0;
}
int max = 0;
for (int i=1; i<n-1; i++) max = Math.max(max, memo[i]);
/* max = Maximum internal match to prefix */
j = memo[n-1];
while (j>max) j = memo[j-1];
System.out.println((j==0)? "Just a legend" : s.substring(0, j));
}
}
may be this will work
for i = 1 to np[i + z[i]] = max(p[i + z[i]], z[i])and one more travesing from n to 1. doing
p[i] = max(p[i+1] - 1, p[i])A portion of necroposting.
You can solve 2 problems: 1) Create a sample string with given Z-function over an infinite alphabet. Just follow Z-function generation algorithm and store all the equalities in DST. Next, check the result. 2) Create a sample string with given prefix-function over an infinite alphabet. Just follow prefix function generation algorithm and store all the equalities in DST. Next, check the result.
(in Russian: http://contest.samara.ru/ru/problemset/735/)
No advantages =)
But if we compare Z-function and Prefix-function then:
1) "monotony"
If z[i] = x, it means substrings [0..j] [i..i+j] are equals for all j from 0 to x.
If p[i] = x, it means [0..j] = [i..i+j] for all j = x, p[x], p[p[x]] and so on.
2) LCP (Largest Common Prefix)
Z-function in fact calculates LCP[0,j] for all j. It can be used for not only substring searching.
I also have two examples of problems which, I hope, show advantages Z-function over Prefix-function.
1) Determine number (No.) of the string in its suffix array in O(n).
2) Determine number (amount) of substrings, which have at least two occuarances in the string in O(n2) (of course, you can solve it in O(n) using suffix tree).
Very good algorithm
Hi, I think that assumption is wrong, take this test case:
Is it right or I misunderstood what he said? :o
(sorry for editing, I am new to Markdown :p)
Amazing tutorial. Thanks to you and e-maxx I have learned a new algorithm today :D
Thanks for this amazing tutorial. Keep writing! :)
Very nice explanation of the algorithm , thanks !
There seems to be one more condition missing in the routine, actually not missing but Algo is doing more work for that case. So, there can be three cases when I < R and there are as follows
1) Z[k] < R-I+1 in this case Z[I] = Z[k] //Z[k] Length will be strictly less than R-I+1 2) Z[K] > R-I+1 in this case Z[I] = R-I+1 //This is the missing case. 3) Z[k] = R-I+1 in this case Z[I] = R-I +1 + [keep computing the Z values starting from the position R] // This is because we know Z[K] is strictly = R-I+1 (Beta) but Z[I] can still match with Patterns next character thus start computing the Z values for this Z[I] starting position from R until the mismatch is found.