“@Tom Dibble
"The WRONG approach would be to pick one in the list at random, then another after it at random, etc, as this would unduly weight the later-in-list items.”
Actually no (unless I misunderstood what you meant)."
You did, but mainly because I wrote it poorly
I meant to describe an approach I’d fixed before, in which a programmer searching for a random combination of “r” items in a group of “n” would, essentially:
until ( found one )
… int minindex = 0;
… for ( i from 1…r )
… … minindex = random # between minindex and ‘n’
… … add item at ‘minindex’ to combination
… … minindex++;
…
Obviously, that’s a bit pseudo-codish, but the main thing is that the “next index” is alwasy a (random number of indexes) AFTER the “last index”. So, on average, the first index chosen skips half the list and eliminates that first half from consideration …
In any case, my point was that someone felt that this approach was either elegant (which I can’t imagine they really convinced themselves of that … any time you have an “until this works” loop you’ve pretty much written off “elegant”) or simple to understand despite possible performance problems, which I suppose might fit Jeff’s definition of “naive”, but definitely fits my definition of “WRONG”.
@Jeff:
"A nave algorithm is a very simple solution to a problem. It is meant to describe a suboptimal algorithm compared to a “clever” (but less simple) algorithm. Nave algorithms usually consume larger amounts of resources (time, space, memory accesses, …), but are simple to devise and implement.
It shouldn’t be wrong, just inefficient."
Well, the article spent a lot of time discussing what happens when something is “wrong”, not “correct yet naive”. I suppose that’s the base of confusion here in the comments.
I’d suggest there are two orthogonal issues here: “wrong” versus “right” and “simple” versus “complex”. It is just as incorrect to conflate “simple” with “wrong” as it is to conflate “simple” with “right”. “Naive” appears to be a synonym for “simple” (as opposed to “simple AND wrong”), but I’m not sure I’d agree with the connotations of that. Sure, “simple” isn’t always the best answer. The vast majority of the time, though, it is.
Using the shuffle approach, and pretending there isn’t a perfectly-simple-yet-properly-complex implementation of “shuffle” in pretty much any modern language, let’s examine some options here:
Simple, Wrong: The “exchange items with bug” approach listed in this article.
Simple, Right: Assign a random number to each item, sort according to that random number. Doesn’t perform well, but works. And is easy to explain to the next guy.
Elegant, Right: Use the published “exchange” algorithm. Performs well, works, but is hard to explain to the next guy why each line is precisely how it is without saying “just read the reference material!”
There’s a place where performance matters above maintenance, and there’s a place where the reverse is true. The trick for us as developers is figuring out which place (or where in the continuum between the two) we are in, and thus which approach to the problem makes the most sense for us.
In my previous post (in response to the question about pulling one of the 100C10 combinations) I gave two solutions, one of which was simple and easy to understand, the other of which would take several pages with diagrams to explain (but performs and scales significantly better than the simpler solution).
Which is the right one to use? I’d say you always start with the simple one, then move on to more complex implementations only when you’ve identified that performance is an issue. Moving forward, the difference is often between “anyone working on the application can touch this algorithm” and “any change to this algorithm needs to be looked at by YOU, yes, YOU, the original developer”. The former tends to foster sanity far more than the latter.
“Naive” tends to connote an incorrect choice. “Simple” is not always, and in fact is not even often, an incorrect choice.