Wednesday, June 04, 2008

Facebook Wiretap puzzle - Heuristic Solution

The original problem description is available here at Facebook website (Illegal Wiretaps).

Since the puzzle is basically related to finding the minimal cost of analyzing all the names, we need to figure out the cost of assigning each name to the different programmers. The cost associated with each (name, programmer) pair is quite straightforward given all the weird behaviors described for each programmer :-)

Let's use matrix c[n, n] (bad feeling here because we are looking at n^2 at least here) to store the cost values where n is the number of names and m is the number of programmers. c[i, j] represents the cost for assigning name[i] to programmer j.

For i = 1 to n {
parse name[i] to get length[i], vowel[i], consonant[i];
for j = 1 to n {
c[i, j] = length[i];
if(j%2 == 0) c[i, j] += vowel[i]*1.5;
else c[i,j] += consonant[i];

//Need to find the greatest common divider
gcd = GCD(length[i], j);

//find the prime factors of the gcd
k = factor(gcd);

c[i,j] += k*2;
}
}

Now that we have the cost matrix, the problem basically becomes how to pick n values from this matrix so that the sum of these n values is minimized. In addition, these n values cannot share the same row or column numbers with each other. The solution to this part might be obvious to anyone with CS background, but it took me some time to associate this problem with an interesting problem (mincost maxflow) in graph theory: If we view the 26 names as 26 nodes on the left side and the 26 programmers as 26 nodes on the right side, then each left node has a full mesh connection to the 26 right nodes forming a bipartite graph; we can view each connection from left to right as having capacity 1, but the cost is equal to c[i, j]. In addition, if we connect a virtual source node to the 26 left nodes and a virtual sink node to the 26 right nodes, assuming the virtual connection each have capacity 1 and the same cost (e.g. 1), then the problem now becomes how to find the max flow from the source node to the sink node with lowest cost. The max flow in this case is always 26, but the minimal cost and mapping structure is what we want to find here.

More details will come ...

Other stuff related to the above solution:
1. Find the GCD of two numbers (Facebook programmers really like GCD as it is used also in the Korn shell puzzle):

GCD(i, j)
{
if (j==0) return i;
return gcd(j, i%j);
}

2. Find the prime factorization of a number. This is supposed to be a difficult problem as many of the crypto algorithms rely on the fact that it is hard to factor numbers. But luckily we are dealing with small numbers here (first of all, only 26 characters used; second of all, hope any reasonal people will not give their child a long name that takes hours or more to write :-)), so the follow simple algorithm should do:

factor(n)
{
k = 0; factor = 2;

while(n > factor) {
if (n%factor == 0) {
n = n/factor;
k++;
} else { factor++;
}
}

return k;
}

The return value k is the number of primes that n can be factored into.

Monday, June 02, 2008

Facebook Korn Shell Puzzle - heuristic solution

The original problem description is available here at Facebook website (Korn Shell).

The example itself gives a hint on how to start the thinking:
Mike Korn <-what you typed originally
Jodt Dabn
Kafh Flcn
Dlmw Mgzn
Fgjq Jixn
Mike Korn <-what appears eventually

For each character, there is group of characters related to it and they form a closed group {M, J, K, D, F}, {O, A, L, G, I}, {T, H, W, Q, E}, {B, C, Z, X, R}, {N}, the period of each group is 5, 5, 5, 5, and 1 respectively in this case, so the number required to get back to Mike Korn from initially typing this name would be least common multiplier of those period, which is 5 in this case.

Now if we generalize this problem for any name, then the question can be viewed as follow:
For a name containing n distinct characters, how do you divide 26 characters into m (m <= n) groups so that the least common multiplier (LCM) of the cardinality of these m groups can be maximized. The valid value for n is between 1 and 26, but it is obvious that even if n=26, you may still want to divide the characters into fewer than 26 groups to achieve max value.

Let's focus on finding the max value if we divide the 26 characters into m different groups. Let's define f(n, m) as the max value for dividing n characters into m groups.

For m=1, it is easy, the max period is n if we view the n characters as a single group. f(n, 1) = n, n = 26 in the facebook puzzle.

For m=2, we need to divide the n characters into two groups, the max period is

f(n, 2)
{
max = 1;
for i = 1 to n/2
if LCM(i, n-i) > max {max = LCM(i, n-i); }

return max;
}


For general m, the problem is a bit trickier, we can try to use the following recursive function to get f(n, m)

f(n, m)
{
if n = m { return 1;}
if m = 2 { return f(n, 2) as calculated above in m=2 case;}

max = 1;
for i=1 to n- m + 1
if LCM(i, f(n-i, m-1)) > max { max = LCM(i, f(n-i, m-1); }

return max;
}


If the top-down recursive approach is not preferred, then we can convert the above function into a bottom-up array calculation as follows:

Initialize f[n, n] to all 1;

for i = 1 to n {
for j = 1 to i {
if i = j then f[i, j] = 1;
else if j = 1 then f[i, j] = i;
else {
for k = j-1 to i-1
f[i, j] = max(f[i, j], LCM(i - k, f[k, j-1]));
}
}
}

Once you get f[n, m], then for a particular n, the max value will be the one given by max (f[n, i]) where i=1 to n.

In our case, n=26, the result shows that when i=5, you achieve the max value 1260. This means that when the name contains at least 5 distinct characters then there is a way to design the keying function so that you need to type 1260 times to get the correct name. The keying function is designed by dividing 26 characters into groups of 1, 4, 5, 7, and 9, each group containing at least one of the distinct characters in the name.

Now, the above solution gives us O(n^3) time complexity which is awful. Is there anyway we can optimize the solution? After let my mind wondering off some time at work :-) (sorry, I can't help but thinking about this thing ...), it seems that there might be another simpler way to formulate the solution:

Instead of looking at f(n, m), where m is the distinct number of groups that we need to divide the n characters, we can just look at the f(n), which represents the max value obtained by dividing the n character into n or fewer than m groups. Obviously f(1) = 1. The problem now becomes: given f(i) where i < n, how do we get f(n)? We can see that the new divisions allowed are:
- putting all n into one group
- putting i (1~n-1) into one group, while the rest are distributed based on the best case in f(n-i).
As a result, we get the following simplified algorithm:

initialize all f[n] to 1;

for i = 1 to n
for j = 0 to i-1
if(j=0) f[i] = max(i, f[i]);
else f[i] = max(f[i], LCM(i-j, f[j]);

return f[n];

Again if n=26, then f(n) = 1260. Note that j=0 in the above code is important to cover the boundary case; if you start from j=1 the calculation will be off by one (f[n] actually gives the value for n-1).

Other stuff related to this problem:

1. Function for calculating LCM:


LCM(i, j)
{
k = i*j;

//calculate the Greatest common divisor of i and j;
while i!= 0 {
if i < j {
tmp = i; i=j; j=tmp;
}
i = i%j;
}
return k/j;
}


2. You can notice that in both the complex solution and the simpler solution, there is an important assumption, the calculation of f(n) (f[n, m] in the first case) can be build upon the value calculated in f(i) (f[i, m-1] in the first case) where i is smaller than n. Still need to find a way to prove the above algorithms are correct based on this assumption.

Baggage fees for different airlines

Here is a list of the fees that the airlines will charge you for domestic flights within US unless you have achieved certain status in their mileage program.

Airlines 1st Checked Bag 2nd Checked Bag
American$15$25
UnitedNone$25
DeltaNone$25
US AirwayNone$25
SouthwestNoneNone
ContinentalNone$25
AlaskaNone$25
Midwest ExpressNone$20

Sunday, June 01, 2008

Uploading AVI or big images to Gallery

As long as ffmpeg plugin is installed properly, you should be able to upload video files to Gallery.

However, while using Gallery Remote, I kept getting some strange errors every time I tried to upload some AVI files. The error messages does not help much to resolve the issue. After some research, I found that the real problem is really the upload size limitation of your web server, nothing else.

One way to solve the problem is to change the upload size limitation within php.ini (usually located under /usr/local/lib/):
post_max_size => your preferred size
upload_max_filesize => your preferred size