Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts

Sunday, November 13, 2011

Rabin-Karp Algorithm

A cool algorithm that can be a very good interview question. This algorithm deals with searching a set of strings in a large text. The typical use case is to detect plagiarism.

Here is the pseudo code from Wikipedia entry.

 1 function RabinKarp(string s[1..n], string sub[1..m])
 2     hsub := hash(sub[1..m]);  hs := hash(s[1..m])
 3     for i from 1 to n-m+1
 4         if hs = hsub
 5             if s[i..i+m-1] = sub
 6                 return i
 7         hs := hash(s[i+1..i+m])
 8     return not found
 
The trick here is not to let the line 7 run in O(m). This can be easily achieved by leveraging the fact that we already have the hash code computed for s[i] to s[i+m-1]. So, if we use a rolling-hash technique, we can get this done in constant time. For example,

s[i+1..i+m] = s[i..i+m-1] - s[i] + s[i+m]

The real benefit of this algorithm is seen when we search for multiple patterns within a text. We could use a bloom filter to store the hashes.

For the single pattern matches, a better algorithm is provided by Knuth-Morris-Pratt. The Wikipedia article on the same is really informative.

Thursday, October 27, 2011

Maximum in a sliding window

Given an array of n integers & a window of size w, keep sliding the window from the left-most to the right-most while emitting the maximum of the current window before every move.

The brute-force solution will work in O(nw) because for each of the (n-w) possible moves of the window, we need to look at all the w elements in each of those windows.

A slight improvement is possible by using a heap data structure. Maintain a max-heap of the pair (element & position). While removing an element and finding the replacement for the root, keep removing the elements that are not required. In the worst case, there would a space requirement of O(n) to achieve this.

The best approach for this problem is to use a double-ended queue (deque) data structure. The idea is to keep just the minimal number of elements in this queue while doing the operations. For example, when the current queue has (3 1 5) and the incoming element is 7, then there's no need to keep any of the elements in the queue other than the incoming one. So, keep inserting the elements into the back of the queue as long as they are smaller than the current last. Also, if you store the positions instead of the actual numbers in the deque, it would be easy to eliminate the stale ones from the front. This can run in O(n) as the numbers are inserted/deleted constant number of times.