Shuffling

codinghorror · December 4, 2007, 12:00am

Thank you for not posting suck this time.

No-- thank you!

MarkH · December 4, 2007, 12:00am

@Alex – duly noted though to the Mersenne Twister’s credit, the author specifically indicated that MT was not crypto-secure. The author’s implementation of MT is a chunk of highly optimized C code and is likely to be faster than good crypto-secure PRNGs, though I haven’t run benchmarks. MT grew out of the need for accurate Monte Carlo simulations.

Mike · December 4, 2007, 12:00am

"“Why even bother to shuffle the array? It’s probably easier and simpler to put the cards into a collection and then randomly pick one to remove based on the size of the Collection.”

I’ve used a similar approach in the past, though I seem to remember abandoning it for some reason. I can’t quite remember why though."

It’s more security conscience as well. Assuming the random number generator is good the only thing you can predict, via memory inspection, is what cards are left. As we all know people count cards in the casinos as it is, so it’s not like that’ll be giving anything away.

Brian · December 4, 2007, 12:00am

“var cards = Enumerable.Range(0, 51);
var shuffledcards = cards.OrderBy(a = Guid.NewGuid());”

How does
var shuffledcards = orderby(a=Guid.NewGuid())
work?

I wouldn’t think you could sort by a guid considering its not part of int.

jcnnghm · December 4, 2007, 12:00am

Years ago I was tasked with creating a card game, and given instructions to make it as realistic as possible, as it would be used to simulate actual card playing. At first, I thought about a true random shuffle, then I sat at my desk with a pack of cards, and actually shuffled, and I noticed that none of my shuffling was really very random at all.

The two main methods I was using were cutting the deck, and riffle shuffles. I decided to implement these, along with a few other shuffle methods directly in the code, and then call random shuffle methods, a random number of times. For example, you might have a cut called with an offset of 30, which would yield a top pile of 22 cards, and a bottom pile of 30, then the top pile would be moved to the bottom, followed by a riffle shuffle with an offset of 25, where there were a pile of 25 cards and a pile of 27, which were then combined one after another, with some weighting to determine the likelihood of two or more cards being grouped, followed by another cut.

In the end, the algorithm gave a more natural shuffle so to speak, and was much closer to the results of a human physically shuffling the deck. Given humanities ability for complex subconscious pattern recognition, it wouldn’t surprise me at all if normal nonrandom shuffling behavior affects poker betting, on and offline.

Andrew · December 4, 2007, 12:00am

This reminds me of a blackjack program on the VIC20 that got much slower towards the end of the pack because it was looking for a card it had not already used.

Speeking of seeding with milliseconds, even if you have your computer clock set at the same time precisely to the millisecond, how do you know exactly when the random numbers are being generated?

Those pesky preemptive windows messages also get in the way and disrupt the timing.

TraumaPony · December 4, 2007, 12:00am

There’s a third problem with the first method, and that’s got to do with the probability actually not equaling 1/count after you start choosing new ones. I can’t remember exactly, but I recall doing something quite similiar in school.

Mike_Lee · December 4, 2007, 12:00am

@Mike Random removal from a mutable array is the same approach I used last time, and the same approach I’ll use next time, unless someone shows me why it’s wrong.

Bill145 · December 4, 2007, 12:00am

@Shawn Poulson
Swap an integer, the XOR way:

Cute, but that isn’t as efficient as using a temp variable. Any optimizing compiler would locate t in a CPU register, using no actual memory (if we use integers). The xor method would move both variables into registers first before being able to operate on them.

unthinkableM · December 4, 2007, 12:00am

“Speeking of seeding with milliseconds, even if you have your computer clock set at the same time precisely to the millisecond, how do you know exactly when the random numbers are being generated?”

If you know roughly the time the RNG was seeded and the two cards you were dealt as well as the cards of the flop the cheat program could check a range of seed times until it gets the same cards. Looking at the screenshot on the linked page it seems that is exactly what is happening.

Jeff: Thanks for the interesting post!

StephenT · December 4, 2007, 12:00am

In Ruby:

(1…52).sort_by { rand }

FraserOrr · December 4, 2007, 12:00am

True random number generators are pretty easy. You don’t need nuclear decay. Actual random number generators are pretty cheap, and I bet must gambling sites have them. A DIY one is also easy. Plug a microphone into your sound card, put it next to the power supply fan, and read the sound coming in. Filter out all but the very highest frequency components, and you have a pretty strong source of randomness.

Dustin · December 4, 2007, 12:00am

SELECT CardNumber FROM dbo.Cards ORDER BY NEWID()

Albert · December 4, 2007, 12:00am

This reminds me of how I was taught sorting algorithms in high school and a paper I wrote later. My teacher used a deck of cards to teach me the basic sorts (Shell, Bubble, Quick, Insertion etc. etc.) In the paper I wrote I included an algorithm that was the least efficient (in terms of lines processed to achieve a sorted array).

The algorithm was the equivalent of throwing a pack of cards in the air and checking if it is sorted. If not, try it again. I used the real number between 0.0 and 1.0 approach in the algorithm.

I wish I could include the algorithm and the results here but I am currently at work.

RobertF · December 4, 2007, 12:00am

What does a GUID get you that, say, generating a number based on a good /dev/rand implementation not get? GUIDs are designed to be unique globally – across machines, etc., etc., which means that many implementations have a lot of redundancy (aka “waste”) when used as random seeds on a local box.

I do need to generate a number from a field larger than 52!, but that’s still just generating a number, so why not just do that?

One of my biggest beefs in software development is sandblasting the salt off a soup cracker: do what you need to do, and no more.

Mattkins · December 4, 2007, 12:00am

Nothing is ever truly random, just confusing enough to make a person think it is.

Alex_Lucas · December 4, 2007, 12:00am

Could you discuss detail as to why generating a guid is more “random” than calling random? And, is that REALLY a one-liner? It doesn’t look like you’re storing the GUID’s anywhere- how is it comparing against the GUID’s already representing other cards elsewhere in the deck?

Also, @Shawn: Swapping values “the XOR way” sets both your ints to 0 if they have the same value:D I read an article once citing that snippet as an example of “the dangers of clever code”

Andrew_R · December 4, 2007, 12:00am

@ Chris Johnson

I tend to agree, though talking about the internals of a PRNG to determine it’s suitability to a specific fixed task seems, to me, to be missing a crucial point. So many programmers treat PRNGs as something to initialise once and then use to get a potentially infinite number of random bits out of. This is wrong. A PRNG has INPUTS as well as an output. You should be constantly feeding the inputs with anything even remotely random that you can lay your grubby mitts on… keystroke timings, hard-disk response times, bytes received at network layer, CPU cycle count, ring-0 context times, etc. Everything. Anything. All the time.

The Fortuna PRNG by Bruce Schneier Niels Ferguson might have potentially 10s of thousands of bits in the entropy accumulators - from which it can generate an output stream. It is really an impressive piece of work and given enough diverse inputs rapidly achieves immunity from poisoning.

PallM · December 4, 2007, 12:00am

Your comment about O(n) for Knuth Fisher-Yates shuffle algorithm is ignoring a subtle detail. Since there are n! different permutations and representing n! takes log(n!) = O(nlog(n)) bits. So sampling uniformly from all permutations will take at least time O(nlog(n)). Of course we can assume that n fits into a 32 bit number and we’re fine, but for ginormous numbers this would become an issue.

WesternI · December 4, 2007, 12:00am

@Andrew R

I believe there are ways in which this can bite you. Most of us are probably ill-equipped to even measure the true randomness of any such source of entropy, for one thing. I certainly am.

There are – as you point out – expert-designed schemes for entropy distillation and accumulation. When true randomness is important, perhaps we should use those systems, and not just “roll our own.”