The Danger of Naïveté

Carra · December 10, 2007, 12:00am

Interesting article.

I did come up with the “good” algorithm after some thinking.

But didn’t do a frequency test. Reminds me to do it next time I do something that’s “random”. The only right way to check if your numbers are actually random.

snooker · December 11, 2007, 12:00am

they dont monitor the connection localy… they monitor it on their server

Ihoss · December 11, 2007, 12:00am

Wow, yet again you post an excellent post! This has got to be my favorite blog, and I check it every other day. Keep the programming posts coming

Rhysc2 · December 11, 2007, 12:00am

Sweet post.

Jake_Heidt · December 11, 2007, 12:00am

Jeff,

I second in the query as to which app you used to generate your graphs. Care to share?

Arthur_N · December 11, 2007, 12:00am

Thanks for the great post Jeff! It’s an eye opener. It brilliantly shows that even the simplest bit of code may require quite advanced testing to detect subtle flaws.

Why do you call the first, biased algorithm, “naive”? My reaction when reading the code was “oh, never thought of that one!” To me it actually looked less intuitive than the KFY algorithm. Each iteration of KFY brings one random card into place, then leaves it there. The “naive” one messes with its partial shuffling, just like combing a giraffe.

KG142 · December 11, 2007, 12:00am

I’m surprised that the focus of this article is on testing, rather than proving, the incorrectness of the so-called “naive” algorithm.

If you’re going to be designing critical, non-trivial algorithms for your software, testing alone isn’t enough because it’s impossible to test it for all input values. It’s also easy to accidently miss important boundary conditions.

Erik32 · December 11, 2007, 12:00am

Funny that no one has mentioned the Faro shuffle, which is a very quick shuffling method:
http://en.wikipedia.org/wiki/Faro_shuffle

Michael · December 12, 2007, 12:00am

You just wanted to play with your graph chart generator. Admit it.

codeguy · December 13, 2007, 12:00am

Is it not the case that when we need an algorithm to search, shuffle or sort, we should be copying them out of textbooks instead of navely assuming we can do just as well as theoretical computer scientists?

It is not. What you should be doing is learning the algorithm out of the textbook. The ability to understand and implement algorithms devised by “theoretical computer scientists” is one measure of a good programmer. If you copy the algorithm blindly, you may implement it, but I claim you’ll probably never know whether you implemented it correctly or well.

TimC · December 14, 2007, 12:00am

In JavaScript, I usually do this:

function shuffle(arr) {
return arr.sort(function () { return Math.sin(Math.random() * 360); });
}

It’s not efficient or secure, but hey, it’s fun!

RobertF · December 14, 2007, 12:00am

“If you copy the algorithm blindly, you may implement it, but I claim you’ll probably never know whether you implemented it correctly or well.”

More importantly, it is unlikely the text book will address exactly the problem before you. You need to understand how your requirements are different and modify things accordingly and be able to be sure that your modified algorithm is correct and scales well on the axes required.

Gobott · December 15, 2007, 12:00am

Come on, I want to see the graphical comparison with a 52-card deck! Ya left me hanging.

RichardB · December 19, 2007, 12:00am

There is a strange non-random behavior of the VB RANDOMIZE statement. My alternative shuffle I’ve used:

' Initialize the deck
ReDim Deck(UBound(a))
For n = 0 To UBound(a)
    Deck(n) = n + 1
Next

Randomize

For n = UBound(a) To 1 Step -1
    nSelect = Int(Rnd() * (n + 1))
    ' assign to the random array value
    a(n) = Deck(nSelect)
    ' replace array entry with unused entry
    Deck(nSelect) = Deck(n)
Next
a(0) = Deck(0)

Typically results in:
123: 166818
132: 166768
213: 166347
231: 166712
321: 166508
312: 166787

But moving the RANDOMIZE statement into the FOR loop results in:
123: 161243
132: 180510
213: 172909
231: 174002
321: 159293
312: 152043
with 132 consistently the most frequent and 321 312 consistently the least frequent.

fabio1 · December 22, 2007, 12:00am

I’ve read the article and it is very interesting.

I thought i’ve made that error (i’ve just started my study as programmer) shuffling a deck but…

I didn’t use That naive algorithm i used another one…
I premit it is SLOW and STUPID, not at all smart and logical as yours but…

that’s the java code

public static void shuffle(int[] a){//using an array of int..
    Random ran = new Random();//i use 2 random because with just 
    Random ran2 = new Random();//one isn't AT ALL randomness
    for (int i = 0;i1000;i++){//swaping a lot of times(1000) 2 int, 1000 is a big number compared to 3, but it works "fine" even with 52
        int kas =ran.nextInt(a.length);
        int kas2 = ran2.nextInt(a.length);
        swap(a,kas,kas2);           
    }
}

I try to calculate mine error with a deck of 3 and shuffling 6 million time.
Ok after 5 minute (yeah with a computer) of calculus (when I said that is slow I meant SLOW) my avarage error of 100/6 % is 0.027 (let me say: nothing).
And the KFY is (using a java method with Random class) 0.45% (let me say: a lot). I know that probably the 0.45 is caused by a lack of randomness in the Random class, but what I want to say is that my algorithm (ok naive, is a nice description) is PERFECTLY WORKING. Ok ok is true i didn’t try with greater deck.i’ll do. but that result speaks clearly to me!
Thank you and Bye

AlessandroG · December 24, 2007, 12:00am

http://www.sgi.com/tech/stl/random_shuffle.html

This algorithm is described in section 3.4.2 of Knuth (D. E. Knuth, The Art of Computer Programming. Volume 2: Seminumerical Algorithms, second edition. Addison-Wesley, 1981). Knuth credits Moses and Oakford (1963) and Durstenfeld (1964). Note that there are N! ways of arranging a sequence of N elements. Random_shuffle yields uniformly distributed results; that is, the probability of any particular ordering is 1/N!. The reason this comment is important is that there are a number of algorithms that seem at first sight to implement random shuffling of a sequence, but that do not in fact produce a uniform distribution over the N! possible orderings. That is, it’s easy to get random shuffle wrong.

choim · January 26, 2008, 12:00am

I have a severe problem with the concept that a specific set of, for example, cards would have equal draws for each card after 6,000 or a jillion-trillion cycles of drawings. That’s not random. Random has inequalities all thruout and in the final count. To use equal times drawn as the test of randomness is really missing the mark.

choim · January 27, 2008, 12:00am

Going to sleep just now I realized what a stupid comment I made: because a deck of cards IS a specific set the only good test of random draws is exact equality of times drawn; every card will be drawn exactly one time per each time the deck is drawn. Sorry… I knew I was out of my league here. I was thinking big random whereas the discussion here is about little random. Go th Stumbleupon and you’ll I commented that I may have embarassed myself. Glad I wasn’t flippant.

Matthew · December 11, 2008, 12:00am

YES! I finally see the difference between the two algorithms! =D

MarkR · June 11, 2009, 12:00am

I realize this is an old article and I might be beating a dead horse here, but I’d like to draw another concrete analogy.

Suppose we wanted a single random number from 1-5 inclusive, and all we had was a 6-sided die.

In the naive algorithm, if we rolled a 1, 2, 3, 4, or 5, everything would be hunky-dory. But we might roll a 6 and then just call it a 1 – of course, in this case, the numbers 2-5 each have a 1/6 chance of being our random number, but the number 1 has a (1/6 + 1/6) chance of being our random number. We have no choice but to throw away the 6 if we want an even probability in 1-5.

On the other hand, if we asked one of our D&D-playing friends for a 20-sided die, we could map 4 numbers on the die to 1 of the random numbers we want; i.e., 1, 6, 11, or 16 would be counted as a 1, and 2, 7, 12, 17 would be counted as a 2, etc. Just something to keep in mind if the uniform random number you need is in a range of a power of 2 and you can request a certain number of random bits…