Shuffling

Jeff, I actually found your initial method to be “weird” - so to speak. I’ve always used the “assign a random id and sort” method myself, can’t even remember if I saw it online or came up with it; but even if it weren’t for the big-O, it’s still the “logical” way of doing it.

Correct me if I’m wrong here: When you replace each card with a randomly selected card, you’re going to swap some cards around twice and some cards around once; because as you move along the deck, you can end up swapping each card with one either above or below it. Then when you get to that card, you swap it again, and so on and so forth.

This is closest to the behavior of a “manual” shuffle.

But when you assign a random integer to each and sort by it, you’re guaranteeing that all cards have the same probability of being shuffled.

Both end-results are as “random” as each other, but the second method gets you a more “equal” randomness - which is something that counts in cryptography.

Although you’ve correctly identified some of the things a secure card shuffling algorithm needs to do, you’ve missed a lot of others. Chris Johnson brought up one issue in his comment above, and you can read some more at http://en.wikipedia.org/wiki/Shuffling#Implementation_of_shuffling_algorithms

Hey, where are the test results? You claim that your algorithm is a “secure, unbiased” shuffle, but is it? Is it even better than the built-in RNG? Please run your shuffling algorithm through a test battery like NIST or TestU01 and let us know how that goes.

http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html
http://www.iro.umontreal.ca/~simardr/testu01/tu01.html

[Also: When n=52, O(n) notation doesn’t mean squat. For small n, an O(n log n) algorithm can easily beat an O(n) algorithm depending on the constants involved. O(n) analysis generally assumes that memory access is O(1) and arithmetic operations are all equally fast, but we all know neither of those assumptions is true. At modern CPU speeds, memory access patterns and CPU instruction scheduling dominate the cost of many algorithms – especially RNGs.]

You should see what pokerstars.com writes on its website about its shuffle. For the random number generator, it supposedly uses, among other things, mouse movements of the users!

Greg

Funnily enough, I wrote a quick implementation of Knuth’s shuffle in Haskell a couple of days ago; http://hpaste.org/4216#a1.

All right, everyone who didn’t mention TAOCP Vol-2, hand in your geek cards right now, you’re all banished for the next 12 months.

For the newbies out there that are scratching their heads asking “what the fuck is this TAOCP thingie, just google it.”

My 1st game was a Tetris clone. I made it in Turbo Pascal 3.0!
I could not find a random() function so i made my own by using a pattern of digits returned from logarithmic functions.

(*) If you want secure pseudo random numbers I think its a really BAD idea to use known methods.

Ok, so PokerStars has done it’s homework. Any more good examples?

Good to know that the shuffles are properly done when you feel like playing Poker.

Jeff, I’m disappointed. There are two things I completely disagree with here.

Calling up a new GUID and using it to shuffle is too simplistic and abstracted. You’re asking for someone to come along and spend a some time reverse engineering their way back to a point where they can find a shuffle through a rainbow-like attack. Where’s the ‘random’ in GUID creation? How do you know it’s random enough? GUID’s aren’t always perfect, and it could be possible to track them down if you were to use a static function like that. This is a problem where even ‘theoretically possible’ is not good enough. Ergo a custom built algorithm with a phenomenal amount of randomness is the only worthwhile and fair solution.

The more important thing you forgot is that as soon as you put money on a table, it belongs to the casino. A better encryption or shuffling algorithm is only advertising and customer retention. The goal of any casino is to pull as much money as possible with the odds on its side onto the table, (or pull out a percentage) then keep the players playing as long as possible. There is inefficiency in the system because the system is designed by those creating the inefficiency.

A casino programmer’s job is to bring in as many players as possible and make standard distribution at least APPEAR to apply so you can continue siphoning money from more and more people… I mentioned fair solutions, but it may not be in the casino’s best interest to have a fair solution all the time.

I do not think that shuffling is an “easy solved problem”, especially since the shuffle algorithm you describe is very easy to get wrong. In fact, almost all swap-based algorithms are wrong.

You already made a mistake in your textual description: “So we loop through the deck, switching each card with another card from a random position in the deck”. That is wrong (doing just that would produce a biased shuffle) and it is also not what you are doing: you are not switching with any random card, but only with a random card that hasn’t been swapped yet!

The subtle difference makes the difference between a biased and an unbiased algorithm (assuming the RNG is fair). Your textual description does not match your code, which is why the code is correct while the description is wrong.

A better way to describe the algorithm that you actually implemented (that is probably easier to understand and analyze as well) would be:

  • At the start the shuffled deck is empty, and all (unshuffled) cards are in a pile.
  • Remove a card from the pile at random, and add it at the end of the deck.
  • Repeat until the pile is empty.
    The deck is now shuffled without bias.

The reason you use swaps at all, is just to simulate the behaviour describe above without having to allocate a separate data structure. In your for loop elements at index 0 through i represent the (unshuffled) pile, and elements at index i through (cards.Length) are the deck (you add cards in the front, in this implementation).

To make matters worse, your code for the shuffling-by-sorting is wrong. A sorting algorithm is only guaranteed to do anything meaningful if you call it with a partial ordering of the data. If you generate a different random key for the same element every time it is evaluated, the keys do not describe a partial ordering.

It probably works in practice because of the way the sorting algorithm works behind the scenes, but the correctness does not follow from the code. Some algorithms may not even terminate (or more correctly have a low probability of terminating) if you do not define a consistent ordering.

(DISCLAIMER: I don’t really know the language used (VB.NET?) so it may be that additional guarantees are given by the language or library that I may have missed.)

What about employing several shuffling methods and alternate between them “randomly”?

I’d just use the JDK function.

http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#shuffle(java.util.List)

Funny that in my post about job interviews and reusing JDK functions to reverse a String one poster called me arrogant, because I thought most people when ask to write a String reverser would implement it instead of reuse it.

http://stephan.reposita.org/archives/2007/11/12/job-interviews-for-developers-attention-to-detail/

Thanks for proofing my point.

I don’t think anyone has mentioned HotBits yet (http://www.fourmilab.ch/hotbits/) – random bits made to order, generated from radioactive decay. You can build your own as well, instructions and code is on the site (the inline assembly brings back fond memories :wink:

Peter: That is exactly why I suggested using a random real number between 0 and 1.

If the randomness source is crypto-strength random, and you generate batches of 52 double-precision sort numbers then you should expect there to be collisions in less than one out of every twenty trillion decks generated. (The math justifying this claim is left as an exercise.)

Crypto-strength guids have a much smaller collision rate than doubles. I haven’t worked out the math exactly, but order of magnitude, it’d be around one out of every trillion trillion decks.

In short, this is not a significant source of bias.

SELECT CardNumber FROM dbo.Cards ORDER BY NEWID()

Dear god… No!!

Has Catto given up reading Coding Horror?

The problem with the swap algorithm is not that it is too complex. It is perfectly straightforward. The problem is that some of the n! possible outputs are more likely than others.

Eric, I read your original description of this, and I even went over it with another programmer here. We still don’t understand how the Knuth shuffling algorithm can have any bias, assuming the random number generator is crypto-strength.

Way back circa 1989 I implemented a blackjack program using the sorting algorithm you suggested. The interesting thing was that the “platform” was a spreadsheet – the whole thing was implemented in spreadsheet macros, and it was a complete implementation of the game, with insurance, splitting, etc.

The sort was done by having a named range of cards with 2 columns: the card itself, and the forumla RAND(). To shuffle I just told the spreadsheet to sort the range. I was pretty proud of that little thing.

At the time, I was in college and working part time as a tester of an office application suite. The guys in my department each implemented a different casino game as a means of pushing the tests as far as they’d go.

One guy did craps, and set it up to play automatically all night. The next morning we came in and it was rich, winning a bazillion dollars. We figured “that can’t be right – over time, you have to lose”, so we ran it again the next night, and got the same result. It turns out that he’d uncovered a flaw in the spreadsheet’s random number generator. Through some quirk, the dice rolls came up 11 (and 12) much more frequently than they should have.

Thank you, this post very clearly demonstrates the need to do quality work when developing software. A lot of times we take shortcuts and somewhere down the line, chaos happens. And thank you Mike for that excerpt from the PokerStars web site.

@Andrew R RE: You should be constantly feeding the inputs with anything even remotely random that you can lay your grubby mitts on… keystroke timings, hard-disk response times, bytes received at network layer, CPU cycle count, ring-0 context times, etc. Everything. Anything. All the time.”

I recently stumbled across a Java RNG that sought to “improve” on the one in the library by feeding the result back as the seed. When working on numbers in fairly small ranges it converged on the stable patterns of repeating numbers very quickly. At 5 digits it only produced a handful of results over and over. The requirement in this app was “few duplicates” regardless of any patterns or predictability. I just took out the “improvements.”

How do you factor in those feedback sources to avoid this problem?