Scott on Writing

Musings on technical writing...

Tuesday, July 01, 2008 #

For Some Probability Problems, Seeing Can Be Believing

A few days back I was browsing some of Jeff Atwood's older blog entries and stumbled across one titled, The Danger of Naivete. In it, Jeff discusses an algorithm for randomizing an array that, while simple and seemingly intuitive, is subtlely incorrect in that it over-weights certain permutations and under-weights others. The naive algorithm enumerates each element in the array and swaps it with a randomly selected position in the array. Here's the naive algorithm implemented in C# code.

Random rnd = new Random();

for (int i = 0; i < deck.Length; i++)
{
   // Set swapWithPos a random position such that 0 <= swapWithPos < deck.Length
   int swapWithPos = rnd.Next(deck.Length);

   // Swap the value at the "current" position (i) with value at swapWithPos
   int tmp = deck[i];
   deck[i] = deck[swapWithPos];
   deck[swapWithPos] = tmp;
}

While reading Jeff's blog entry, my stomach sank because I knew that I had authored an article on randomly ordering an array on 4Guys back in the day (circa 2000). I was certain that I had implemented the naive and incorrect algorithm. Upon digging up the old article - Randomly Reordering an Array - I realized that I had used the naive algorithm. Urp. At least in describing my implementation I labeled the algorithm a “quick and dirty way to [randomly reorder an array].”

Of course there are algorithms that randomly reorder an array with an even distribution of permutations. What's interesting is that the simplest such correct algorithm - the Knuth-Fisher-Yates shuffle - is very similar to the naive algorithm. Depending on your programming language of choice, you can likely morph the incorrect naive algorithm into the Knuth-Fisher-Yates algorithm by changing less than 10 characters in the code file.

Rather than blindly change the old article, I wanted to better understand why the naive algorithm doesn't work. Despite Jeff's explanation of why the naive algorithm over-weights certain permutations and why this doesn't happen with the Knuth-Fisher-Yates shuffle, I still had a mental roadblock. It wasn't until I did what Jeff suggested - sit down and draw out the various possible permutations from the naive algorithm vs. the Knuth-Fisher-Yates shuffle - that things came into focus. I also find that having to explain some technology or algorithm or concept helps bolster my understanding of it, in part because any parts that I have trouble explaining force me to do further research. Therefore, this week's 4Guys article - Techniques for Randomly Reordering an Array - looks at ways for shuffling an array and highlights what's wrong with the naive approach and how other solutions account for this. Because Jeff's blog entry was the impetus for Techniques for Randomly Reordering an Array, the two are very common. However, in my article I show the probability breakdown of the naive algorithm (and analyze another approach mentioned by Jeff in his blog entry Shuffling).

In a nutshell, the naive algorithm computes n^n possible outcomes, yet there are only n! permutations available. In the case of a three element array, there are 3^3 = 3 * 3 * 3 = 27 possible outcomes from the naive algorithm yet only 3! = 3 * 2 * 1 = 6 permutations. Because 6 does not divide 27 there must be some potential outcomes that are over-weighted by the naive algorithm and others that are under-weighted. To make this point more clear, check out the following diagram. It shows the possible outcomes when shuffling a three element array whose elements initially have the values [0, 1, 2]. As you can see, the end result is that the permutations [0, 2, 1], [1, 0, 2], and [1, 2, 0] appear five times while the permutations [0, 1, 2], [2, 0, 1], and [2, 1, 0] each appear only four times.

When facing a particularly tough probabilty problem sometimes it helps to sit down and draw out the possible outcomes for the given scenarios. In doing so, things that may otherwise seem very complex boil down to a very apparent solution or explanation.

A great example of simplicity through diagraming is the Monty Hall problem, which goes like this: You are a contestant on the TV game show, Let's Make a Deal and are presented with three doors. Behind one door there is a luxury car. Behind the other two doors there is a poster of a car. The game show host asks you to pick a door. After choosing one, the game show host reveals one of the other doors that has a poster behind it and then asks you if you want to change your door choice to the other unopened door. What is the best strategy?”

The answer is that you should change the door you had selected, because there's a 2/3 probability that the door you did not select houses the luxury car. In other words, there is a 2/3 probability that the door you picked has the remaining poster. The solution seems backwards. It seems paradoxical that the door the host reveals to have a poster of a car could have any influence on your initial selection. But if you sit down and draw out the various probabilities you can see that there's there are more good outcomes if you make the switch than if you hold fast to your initial choice. The following diagram was created by Rick Block; here, instead of a poster of a car you get a goat. Same difference.

While penciling out the various outcomes of a probability problem doesn't feel very sophisticated, it can often be a quick and surefire way to better understand a problem.

posted @ 2:54 PM | Feedback (4)

My Links

Ads Via DevMavens

Archives

Post Categories

 

I am a Microsoft MVP for ASP.NET.
I am an ASPInsider.
<July 2008>
SMTWTFS
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

Comment Stats

DayTotal% of Total
Sunday 2056.8%
Monday 42514.1%
Tuesday 51917.2%
Wednesday 55618.4%
Thursday 58019.2%
Friday 54718.1%
Saturday 1886.2%
Total 3020100.0%

Hour1Total% of Total
12:00 AM 782.6%
1:00 AM 812.7%
2:00 AM 682.3%
3:00 AM 822.7%
4:00 AM 692.3%
5:00 AM 1264.2%
6:00 AM 1193.9%
7:00 AM 1816.0%
8:00 AM 1926.4%
9:00 AM 1585.2%
10:00 AM 1886.2%
11:00 AM 1936.4%
12:00 PM 2016.7%
1:00 PM 1846.1%
2:00 PM 1695.6%
3:00 PM 1354.5%
4:00 PM 1153.8%
5:00 PM 1073.5%
6:00 PM 1013.3%
7:00 PM 1073.5%
8:00 PM 923.0%
9:00 PM 882.9%
10:00 PM 913.0%
11:00 PM 953.1%
Total 3020100.0%

Comments by Blog Entry Date/Time

Day Entry MadeAvg.Total
Sunday 5.00160
Monday 4.80384
Tuesday 4.04477
Wednesday 7.39680
Thursday 6.26676
Friday 5.07466
Saturday 4.78177
Total 5.403020

Hour1 Entry MadeAvg.Total
12:00 AM 5.2937
1:00 AM 1.002
5:00 AM 0.000
7:00 AM 3.8550
8:00 AM 3.72134
9:00 AM 6.06297
10:00 AM 5.63276
11:00 AM 4.22194
12:00 PM 6.16351
1:00 PM 3.09133
2:00 PM 4.89230
3:00 PM 7.67322
4:00 PM 4.00108
5:00 PM 6.07170
6:00 PM 4.64116
7:00 PM 8.95188
8:00 PM 8.63164
9:00 PM 5.00115
10:00 PM 6.31101
11:00 PM 4.5732
Total 5.403020

Learn More About Comment Stats
1 - All times GMT -8...


Blog Stats

Favorite Web Sites

My Books

My MSDN Articles