A few days back I was browsing some of Jeff Atwood's older blog entries and stumbled across one titled, The Danger of Naivete. In it, Jeff discusses an algorithm for randomizing an array that, while simple and seemingly intuitive, is subtlely incorrect in that it over-weights certain permutations and under-weights others. The naive algorithm enumerates each element in the array and swaps it with a randomly selected position in the array. Here's the naive algorithm implemented in C# code.
Random rnd = new Random();
for (int i = 0; i < deck.Length; i++)
{
// Set swapWithPos a random position such that 0 <= swapWithPos < deck.Length
int swapWithPos = rnd.Next(deck.Length);
// Swap the value at the "current" position (i) with value at swapWithPos
int tmp = deck[i];
deck[i] = deck[swapWithPos];
deck[swapWithPos] = tmp;
}
While reading Jeff's blog entry, my stomach sank because I knew that I had authored an article on randomly ordering an array on 4Guys back in the day (circa 2000). I was certain that I had implemented the naive and incorrect algorithm. Upon digging up the old article - Randomly Reordering an Array - I realized that I had used the naive algorithm. Urp. At least in describing my implementation I labeled the algorithm a “quick and dirty way to [randomly reorder an array].”
Of course there are algorithms that randomly reorder an array with an even distribution of permutations. What's interesting is that the simplest such correct algorithm - the Knuth-Fisher-Yates shuffle - is very similar to the naive algorithm. Depending on your programming language of choice, you can likely morph the incorrect naive algorithm into the Knuth-Fisher-Yates algorithm by changing less than 10 characters in the code file.
Rather than blindly change the old article, I wanted to better understand why the naive algorithm doesn't work. Despite Jeff's explanation of why the naive algorithm over-weights certain permutations and why this doesn't happen with the Knuth-Fisher-Yates shuffle, I still had a mental roadblock. It wasn't until I did what Jeff suggested - sit down and draw out the various possible permutations from the naive algorithm vs. the Knuth-Fisher-Yates shuffle - that things came into focus. I also find that having to explain some technology or algorithm or concept helps bolster my understanding of it, in part because any parts that I have trouble explaining force me to do further research. Therefore, this week's 4Guys article - Techniques for Randomly Reordering an Array - looks at ways for shuffling an array and highlights what's wrong with the naive approach and how other solutions account for this. Because Jeff's blog entry was the impetus for Techniques for Randomly Reordering an Array, the two are very common. However, in my article I show the probability breakdown of the naive algorithm (and analyze another approach mentioned by Jeff in his blog entry Shuffling).
In a nutshell, the naive algorithm computes n^n possible outcomes, yet there are only n! permutations available. In the case of a three element array, there are 3^3 = 3 * 3 * 3 = 27 possible outcomes from the naive algorithm yet only 3! = 3 * 2 * 1 = 6 permutations. Because 6 does not divide 27 there must be some potential outcomes that are over-weighted by the naive algorithm and others that are under-weighted. To make this point more clear, check out the following diagram. It shows the possible outcomes when shuffling a three element array whose elements initially have the values [0, 1, 2]. As you can see, the end result is that the permutations [0, 2, 1], [1, 0, 2], and [1, 2, 0] appear five times while the permutations [0, 1, 2], [2, 0, 1], and [2, 1, 0] each appear only four times.
When facing a particularly tough probabilty problem sometimes it helps to sit down and draw out the possible outcomes for the given scenarios. In doing so, things that may otherwise seem very complex boil down to a very apparent solution or explanation.
A great example of simplicity through diagraming is the Monty Hall problem, which goes like this: You are a contestant on the TV game show, Let's Make a Deal and are presented with three doors. Behind one door there is a luxury car. Behind the other two doors there is a poster of a car. The game show host asks you to pick a door. After choosing one, the game show host reveals one of the other doors that has a poster behind it and then asks you if you want to change your door choice to the other unopened door. What is the best strategy?”
The answer is that you should change the door you had selected, because there's a 2/3 probability that the door you did not select houses the luxury car. In other words, there is a 2/3 probability that the door you picked has the remaining poster. The solution seems backwards. It seems paradoxical that the door the host reveals to have a poster of a car could have any influence on your initial selection. But if you sit down and draw out the various probabilities you can see that there's there are more good outcomes if you make the switch than if you hold fast to your initial choice. The following diagram was created by Rick Block; here, instead of a poster of a car you get a goat. Same difference.
While penciling out the various outcomes of a probability problem doesn't feel very sophisticated, it can often be a quick and surefire way to better understand a problem.