When Pride Fucks With You: 2011

Wednesday, December 14, 2011

Android - Send broadcast intents via adb shell

am broadcast -a <INTENT_NAME> -n <your.package>/.<path.up.until.your.BroadcastReceiver> --es "referrer" "utm_source=test_source&utm_medium=test_medium&utm_term=test_term&utm_content=test_content&utm_campaign=test_name"

The '-n' parameter sends the intent to that particular target receiver & if we don't specify that, all the receivers registered for that intent will be receiving it.

A concrete example of the same is

am broadcast -a com.android.vending.INSTALL_REFERRER -n com.ashok.Apps/.MyAnalyticsReceiver --es "referrer" "utm_source=test_source&utm_medium=test_medium&utm_term=test_term&utm_content=test_content&utm_campaign=test_name"

Wednesday, November 23, 2011

Java knowhow-2

By now it's quite obvious that the double-check-lock method of synchronizing for a singleton is broken. It has been comprehensively proved based on Java's memory model & the out-of-order writes. So, what's the alternative for this (while still loading lazily)?

The most famous solution seems to be from Bill Pugh where he employs an ingenious way of doing the same without the need for any synchronization. He does this by exploiting the Java's semantics guarantee that a class is loaded only when referenced for the first time.

Here's the implementation based on his solution (courtesy: Wikipedia).

public class Singleton {
        // Private constructor prevents instantiation from other classes
        private Singleton() { }
 
        /**
        * SingletonHolder is loaded on the first execution of Singleton.getInstance() 
        * or the first access to SingletonHolder.INSTANCE, not before.
        */
        private static class SingletonHolder { 
                public static final Singleton instance = new Singleton();
        }
 
        public static Singleton getInstance() {
                return SingletonHolder.instance;
        }
}

More on JMM here (http://www.ibm.com/developerworks/java/library/j-jtp08223/). In addition, an awesome explanation of how ConcurrentHashMap is implemented.

A JMM overview

Before we jump into the implementation of put(), get(), and remove(), let's briefly review the JMM, which governs how actions on memory (reads and writes) by one thread affect actions on memory by other threads. Because of the performance benefits of using processor registers and per-processor caches to speed up memory access, the Java Language Specification (JLS) permits some memory operations not to be immediately visible to all other threads. There are two language mechanisms for guaranteeing consistency of memory operations across threads -- synchronized and volatile.

According to the JLS, "In the absence of explicit synchronization, an implementation is free to update the main memory in an order that may be surprising." This means that without synchronization, writes that occur in one order in a given thread may appear to occur in a different order to a different thread, and that updates to memory variables may take an unspecified time to propagate from one thread to another.

While the most common reason for using synchronization is to guarantee atomic access to critical sections of code, synchronization actually provides three separate functions -- atomicity, visibility, and ordering. Atomicity is straightforward enough -- synchronization enforces a reentrant mutex, preventing more than one thread from executing a block of code protected by a given monitor at the same time. Unfortunately, most texts focus on the atomicity aspects of synchronization to the exclusion of the other aspects. But synchronization also plays a significant role in the JMM, causing the JVM to execute memory barriers when acquiring and releasing monitors.

When a thread acquires a monitor, it executes a read barrier -- invalidating any variables cached in thread-local memory (such as an on-processor cache or processor registers), which will cause the processor to re-read any variables used in the synchronized block from main memory. Similarly, upon monitor release, the thread executes a write barrier -- flushing any variables that have been modified back to main memory. The combination of mutual exclusion and memory barriers means that as long as programs follow the correct synchronization rules (that is, synchronize whenever writing a variable that may next be read by another thread or when reading a variable that may have been last written by another thread), each thread will see the correct value of any shared variables it uses.

Some very strange things can happen if you fail to synchronize when accessing shared variables. Some changes may be reflected across threads instantly, while others may take some time (due to the nature of associative caches). As a result, without synchronization you cannot be sure that you have a consistent view of memory (related variables may not be consistent with each other) or a current view of memory (some values may be stale). The common -- and recommended -- way to avoid these hazards is of course to synchronize properly. However, in some cases, such as in very widely used library classes like ConcurrentHashMap, it may be worth applying some extra expertise and effort in development (which may well be many times as much effort) to achieve higher performance.

Thursday, November 17, 2011

Java knowhow - 1

Often I run into situations where I want to have multiple constructors which do pretty much the same things except that they call their own versions of the super class constructors. So, I end up creating a private initialization method that will be called from each of these constructors. Too annoying. The 'initialization blocks' will come in handy in these situations. This block of code is guaranteed to be copied to every constructor that you write. So, no more annoying private initialization methods. :)

Sunday, November 13, 2011

Rabin-Karp Algorithm

A cool algorithm that can be a very good interview question. This algorithm deals with searching a set of strings in a large text. The typical use case is to detect plagiarism.

Here is the pseudo code from Wikipedia entry.

 1 function RabinKarp(string s[1..n], string sub[1..m])
 2     hsub := hash(sub[1..m]);  hs := hash(s[1..m])
 3     for i from 1 to n-m+1
 4         if hs = hsub
 5             if s[i..i+m-1] = sub
 6                 return i
 7         hs := hash(s[i+1..i+m])
 8     return not found

The trick here is not to let the line 7 run in O(m). This can be easily achieved by leveraging the fact that we already have the hash code computed for s[i] to s[i+m-1]. So, if we use a rolling-hash technique, we can get this done in constant time. For example,

s[i+1..i+m] = s[i..i+m-1] - s[i] + s[i+m]

The real benefit of this algorithm is seen when we search for multiple patterns within a text. We could use a bloom filter to store the hashes.

For the single pattern matches, a better algorithm is provided by Knuth-Morris-Pratt. The Wikipedia article on the same is really informative.

Saturday, November 12, 2011

Java HashMap implementation, load factor, synchronization etc.,

HashMap is one of the data structures used very commonly in Java & I've always had my curiosity on what exactly goes behind the implementation of the same. On the surface, it looks fairly straightforward. It has to be using some simple hashing technique (of course, with collision resolution). But what bothered me was where exactly do the keys reside! It should have occurred even if I had put in a little more thought about how the collision resolution can be done. Basically, the values with the same hash code will end up in the same bucket & they have to be accommodated using a linear linked list. If this is the case, how do we retrieve a value for a given key (among the possible candidates within that bucket)? Simple, just store the key along with the value so that you can compare & return the value for the correct key. It also helped my confusion on where do the keys reside so that an iterator kind of function can pick all the keys present in the hash map.

Another important thing to note about HashMap (or Collections in general) is about their synchronization. The 'Collections' framework offers a bunch of static methods that will return a synchronized versions of the collection passed to them. But to what extent does this synchronization help? The doc says that it just guarantees individual operations to be synchronized & any complex operations using them should be synchronized manually. Note that the internal synchronization uses the object itself as the lock & so, we need to use the same as well.

An interesting point here is, if we don't use the synchronized version of the collection, what are the side effects? The hint again lies in the collision resolution. Assume that when we are about to do a put, the load factor limit of the hash map is reached. When this happens, Java tries to relocate the hash map to another location with twice the size of the buckets currently used. Also, note that the linear linked list used for collision resolution is actually inserting the elements at the front of the list rather than the end (so as to avoid tail traversing for insertions). Because of this, when the rehashing at the new location happens, the elements in the linked list are now inserted in the opposite order. So, given all these, when two threads try to do the relocation of the hash map, it can potentially lead to an infinite loop (basically one thread can change the next pointer of an element in the list to point to the previous element & the other thread changes the next pointer of the previous element to point to the element changed by the first thread). To avoid this tricky problem, synchronization is definitely needed on the collection if multiple threads are manipulating it.

Thursday, October 27, 2011

Maximum in a sliding window

Given an array of n integers & a window of size w, keep sliding the window from the left-most to the right-most while emitting the maximum of the current window before every move.

The brute-force solution will work in O(nw) because for each of the (n-w) possible moves of the window, we need to look at all the w elements in each of those windows.

A slight improvement is possible by using a heap data structure. Maintain a max-heap of the pair (element & position). While removing an element and finding the replacement for the root, keep removing the elements that are not required. In the worst case, there would a space requirement of O(n) to achieve this.

The best approach for this problem is to use a double-ended queue (deque) data structure. The idea is to keep just the minimal number of elements in this queue while doing the operations. For example, when the current queue has (3 1 5) and the incoming element is 7, then there's no need to keep any of the elements in the queue other than the incoming one. So, keep inserting the elements into the back of the queue as long as they are smaller than the current last. Also, if you store the positions instead of the actual numbers in the deque, it would be easy to eliminate the stale ones from the front. This can run in O(n) as the numbers are inserted/deleted constant number of times.

Wednesday, October 26, 2011

Propogate zeros to rows & columns of a matrix

Given a matrix of integers, if there are zeros in a particular position, make the whole of the corresponding row & column as zeros. Be as space conservative as possible.

The most straightforward solution to the problem is to have 2 arrays, one for storing the positions of zeros in the rows & the other for the columns. Obviously, we've to do better than this.

We can do the same with a single array too. The basic idea is to scan the matrix either row-by-row or column-by-column. Let's assume row-wise scan for this solution. While doing this scan, mark the columns having zeros in the array. If there's a zero in a row, make the whole row as zeros. After completing the row-wise scan, look at the array for the columns having zeros & make them all zeros in the second pass.

There's a beautiful third solution to this problem. Here we just use 2 variables to store the information required. Scan the first row & the first column and if there is a zero in those, mark the same in the two variables. Now, start scanning without the first row & column. If you encounter a zero, mark the same in the first element of the corresponding row & column as zero. Once this pass is completed, using the information in the first row & column, propagate the zeros to the other rows & columns. Finally, based on the information available in the 2 variables, modify the first row / column as required.

Sunday, October 23, 2011

Insert & Median

Maintain a data structure that supports two operations 'insert' & 'get median'. Preferably the median operation should be in constant time.

First solution that came to mind was using a linear linked list and maintain a sorted list of elements. The cost of both the operations are O(log n) in this case.

When looking for a better solution, I stumbled upon this ingenious solution of using 2 heaps to achieve the same. The basic idea is to maintain 2 heaps and a single element having the total number of elements seen so far. The first heap should have half the total number of elements & maintained as a max heap whereas the second heap should have the other half of the elements & maintained as a min heap. This way, the median is always the root of the max heap (if the #elements is odd) or the average of the roots of both heaps (if the #elements is even). The trick is in how do we maintain this structure / property all throughout the insertions. As the heaps are equally split in the #elements, we also know right upfront on which heap a given insertion which land up in. Once we know that, we can always find if the given element directly fits into the heap or it replaces (or rather moves to the other heap) an existing element in a heap. Using this approach we can provide a O(1) 'get median' operation & O(log n) insertion time.

Heap is my favorite data structure. Something about it just fascinates me no end. I would love to find out if there are other such interesting problems that can be solved using heaps.

Saturday, October 8, 2011

Indexing

Found this site http://puzzles.nigelcoldwell.co.uk/index.htm recently. Has a lot of good puzzles. Have heard about 50% of those in school & college. Nice to workout those things again.

On similar note, the site http://www.ihas1337code.com (now moved to http://www.leetcode.com/) is a very valuable source for interview preparation for the big giants.

Sunday, October 2, 2011

Find the median of two sorted arrays

This is one of those rare problems where the solution for the generalized problem is just as costly as the solution for a specific case. In other words, we can find any k^th item of this combined array with the same complexity.

The brute-force approach is to do a merge-sort & find the median. The cost of this would be O(n+m) where n is the size of the first array & m is the size of the second. Second simple way is to see that we don't need to do a complete merge-sort but rather just till we find the k^th element. Hence the cost would be down to O(k). Obviously, we need something better than this. Turns out that this can be done in O(lg n + lg m).

First, compare the middle elements of A and B, which we identify as A_i and B_j. If A_i is between B_j and B_j-1, we have just found the i+j-1 smallest element. Therefore, if we choose i and j such that i+j = k+1, we are able to find the k^th smallest element. Basically, the above can be stated as,

Maintaining the invariant
i + j = k + 1,

If B_j-1 < A_i < B_j, then A_i must be the k^th smallest,
or else if A_i-1 < B_j < A_i, then B_j must be the k^th smallest.

When one of the above condition is satisfied for our initial i & j, we are done. Otherwise, we need to think about which set to use for the further comparisons.

If A_i < B_j, then A_i < B_j-1, because otherwise, we must have chosen A_i. This means that, the chosen A_iis quite low and we must include bigger elements from the set A (i.e., from i+1 to n). On the other hand, the chosen value from B is already too large. Hence, we shall not consider anything above the current chosen value of B (i.e., from 1 to j-1). So, basically we are shrinking our sample space in a logarithmic fashion & hence the final cost would be O(lg n + lg m).

Number of rectangles in a chessboard

I was asked by an interviewer to find out the number of squares in a chessboard to start with. I simply took the iterative approach. For a 1x1 board has 1, 2x2 board has 5, 3x3 board has 14 & so on. It all easily fits into the series 1², 1² + 2², 1² + 2² + 3² etc., Hence, for a 8x8 chessboard, it would be sum of squares of the numbers from 1 to 8.

Then he asked me if the same approach would work for finding out the number of rectangles in the board. I tried something for a couple of minutes & it didn't seem to form any pattern. Later while looking up on the net for the solution, I realized that for finding the rectangles, any iterative approach would be simply futile.

The actual solution to the problem is very simple. Let's see how it works for a square first. The basic idea is to find out the number of possible types of squares & to see where all they can be fit. For example, the square of size 1x1 can be fit in 64 positions, the square of size 2x2 can be fit in 49 positions, the square of size 3x3 can be fit in 36 positions and so on.

For a rectangle, the same thing applies. We need to first identify the various possible shapes of the rectangles that can be formed. Anything from 1x1 to 1x8 and 8x1 to 8x8 is possible. We need to find out where all these would fit & that would be it.

Learning JavaScript

Though a little late, I finally realized that I would be pretty soon ostracized if I don't learn JavaScript. Being in an Internet company, it's more of a mandate rather than a nice to know language. Given that, I started learning it right today & thought it's a good thing to capture some of the important points for later reference.

JS doesn't have the notion of Classes. Everything is an Object. They call it the 'prototypal OO' language to mark this distinction. To remember this fact, here is an example. In classic OO language, we would say "create an object called Bob which is of class Person" whereas in JS, we've to say, "Take the object Person & reuse it as a prototype for a new object called Bob".
In JS, there are no private/public/protected access modifiers. Everything is public. But the effect of the same is achieved using scope.
Nice things like 'Infinity', 'NaN' and '2e5' are available to express the numbers very clearly.
All functions have the hidden argument called 'arguments' that helps in accessing all the parameters passed to that function as an array.
Variables not declared using 'var' are implicitly declared with global scope.
In JS, functions are data. That is, a function with name 'foo' is actually a variable of type 'function' with the value as the body of the function.
'Scope Chaining' exists in JS. That is, an inner function can access variables defined in the scope of the outer function. Also, it follows the usual 'Lexical Scope' for functions. That is, the environment is created while the function is defined rather than when it is run.
A variable can be undefined using the 'delete <variable>' syntax.
'Closure' is a concept where a function keeps a link to its parent's scope even after the parent has returned. There are various ways in which this can happen & should be handled very carefully. The access protection mentioned in point 2 can be achieved using this.