Shuffling

Interesting. Your physical definition of shuffle differs from mine. You’re using the computer science notion of shuffle, not what actually occurs when one shuffles a deck of cards in the standard way.

As part of a course on counting theory, I built a card shuffling algorithm that models the way that humans shuffle a deck.

That algorithm, roughly, is:

  1. Divide the deck into two stacks.
  2. Pop the top item from the first stack into a result stack.
  3. Pop the top item from the second stack into the result stack.
  4. Repeat steps 2-3 until both stacks are exhausted, and the result stack contains all 52 cards.

What’s interesting about this algorithm is that the results appear random by human observers, but are quite deterministic. If you shuffle the stack perfectly either 8 or 26 times (depending on implementation of the algorithm), the deck returns to the original order!

This is the algorithm used by most card sharks and all automatic card shufflers in Reno and Vegas, even if they don’t call it an algorithm.

Then as part of this college course, I looked at the entropy introduced by non-perfect shuffling. Which was pretty interesting. It came in handy when working in Reno, although I wasn’t legally allowed to bet anything.

By the way, thanks for the link to Cigital’s case study! Contact me if you have any questions, or want some code for the above algorithm:
bwalther@cigital.com

By the way, Cristian’s Python code is wrong:

He said:
import random
shuffled_deck = random.shuffle(range(52))

it should be:
import random
deck = range(52)
random.shuffle(deck,random.random)

The shuffle sorts it in place, and returns NULL. The second variable is not optional; it is what specifies which random number generator to use. If none is specified, the deck is not randomized at all.

(in Python 2.5 at least)

@ Western Infidels
"… ill-equipped to even measure the true randomness of any such source of entropy… expert-designed schemes for entropy distillation and accumulation… not just “roll our own.”"

I agree. Absolutely. In fact, I’m pretty sure that is precisely what I said.

@ Jim
"I recently stumbled across a Java RNG that sought to “improve” on the one in the library… I just took out the “improvements.”

Good for you. I hope that worked out for you, though I conjecture that both versions are rubbish.

Did either of you realise what you were actually suggesting in your following statements abot entropy sources?
Let’s review…

So…
You’re suggesting that a PRNG have NO source of entropy?
Perhaps you’re suggesting that somehow the PRNG has a magic internal yet-software-based source of entropy?

Entropy has to come from somewhere and it sure ain’t embedded in the PRNG itself. Meanwhile, I can’t judge the value of any source. But I can provide a crapload of them, which will result in the largest possible yield of entropy. So I’ll do that and trust that the person who wrote the PRNG is pretty smart and knows to push their entropy sources into block-cipher/hash algorithms like SHA1 and mix it all up etc.

I suggest you both read up on the Fortuna PRNG.

I repeat: it is an impressive piece of work - I cannot do it justice here.

I was about to post that someone should build a web service that provided realistic random numbers, but with a bit of quick googling it appears that it exists already. And as another commenter suggested, it uses environmental noise. Cool stuff.

http://www.random.org/

The sorting shuffle is biased! Here’s why:

Sometimes, the random number generator will assign the same number to two different cards. (Because of the Birthday Paradox, this can happen much more often than you might expect.) The order of cards assigned the same number is then just a property of the sorting algorithm. For instance, if you’re using a stable sort, your shuffler will have a slight order-preserving bias.

Whether this bias is large enough to worrying about depends on the context, but it can be hard work to prove that the sorting shuffle is good enough for your purposes. So why not just use the Knuth shuffle instead? It’s really not so hard; I can describe it in one sentence: “To shuffle n cards, shuffle the first n-1 cards, and then swap the nth card with the card at a randomly chosen location.”

There is no random, there is only unpredictable.

I’m working on a gambling program, and made the decision to collect random #s from www.random.org. This site makes the following statement: RANDOM.ORG offers true random numbers to anyone on the Internet.

It’s a different approach - I created a data file of 10k #s, and run through them in a circular fashion each time I need a new #.

Jeff: Julian’s Array Swap algorithm is not the same as Knuth’s. The difference is in the line:

R = Random number from 1…52

Knuth’s version would be:

R = Random number from 1…X

Eric’s (really neat!) proof of bias only applies in the first case.

Eric: I was about to flame you, but then it occurred to me that you’re a smart guy who earns more than I’m ever likely to, so maybe – just maybe – I could learn something.

So why do we differ?

You like Sorting Shuffle because it’s simpler and easier to explain.

I like Knuth’s Shuffle because it’s more elegant and easier to prove correct.

When you’re managing programmers, Sorting Shuffle wins every time. Having said that, the Birthday Paradox still leaves me feeling a little uneasy…

Okay. That’s enough hair-splitting from me.

Mr. Sean Patterson has an interesting quote:
Assuming you’re using 8 “piles” you take the top card of the deck and put it in pile 1, the second in pile 2, and so on. When you go through all your cards (minimum of 40, no maximum) you then grab your piles at random and you have your deck again.

Which comes directly from one of the many Collectible Card Games (in this case I would guess Magic the Gathering, limited play has 40-card decks). It’s neat that he’s mentioned this shuffling method b/c a game like Magic hinges on a balanced distribution of a couple of classes of cards. You’ll have a deck comprised of 3 classes of cards in a 17-16-8 distribution and then you’ll start with 7 off of the top. For the game to really flow you’ll need 2-4 cards from the first class (17) 2-4 cards from the second class (16) and 1-2 cards from the last class (8).

Normally, this works, but every once in a while the hand is “no good” and you have to take a mulligan. To avoid this, rookie casual players will pick up the played cards (about half of the deck), pile them by class and then “weave” them together. They then do a quick shuffle: a couple of riffles and overhand cut or two. This type of shuffling generally cuts down on the mulligans because the cards are either nicely ordered or clumped in small groups (still acceptable).

Of course, the method described by Sean is actually considered cheating at the pro level (yes there is a pro level). “The weave” is something like an automatic disqualification and pro players don’t even stack their cards by type, they just pick up everything between games and shuffle away. The shuffling ritual can suck up 5 minutes of a 50 minute round, but hey it’s part of the game.

Of course, there’s also Magic: Online which provides a digital version of the game. Some class of online players have personified and villified “The Shuffler” as if some machine construct were out to get them.

In reality, the real-life pros also dominate the on-line game, despite all of this randomness. Which is likely a sign that they’re shuffling pretty well :slight_smile:

The guid approach sounds approximately as bad as the original random number generation approach. What they really need to do is focus on randomizing the seed value, and I really don’t think that would take much for this purpose.

I have a few ideas for one. Random typing by a trained monkey would be my second choice.

  • As others have noted, guids are not guaranteed to be a good source of randomness. Though most implementations are in fact crypto-strength random, it would be wise to call a method guaranteed to return crypto-strength randomness.

  • The problem with the swap algorithm is not that it is too complex. It is perfectly straightforward. The problem is that some of the n! possible outputs are more likely than others.

How would you write code to shuffle a deck of cards?

By going to a book of algorithms and looking up the optimal solution.

(I assume the question is a variant of the proposition that the best tool for repairing a TV is a TV repairman).

My first impulse was to randomise the cards as I was filling the array/list (populating a parallel set to check for dupes), not actually shuffling an existing deck. This, and the “pick a random card from the collection”, method does not come from the idea of a pre-existing ordered deck and I think that’s a good thing (because there is no “order” to be unexpectedly exposed should the randomness be less than expected).

Here’s a clip from the PokerStars website (they did their homework):

[quote]

SHUFFLE

“Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.” - John von Neumann, 1951

We understand that a use of a fair and unpredictable shuffle algorithm is critical to our software. To ensure this and avoid major problems described in [2], we are using two independent sources of truly random data:

* user input, including summary of mouse movements and events timing, collected from client software
* true hardware random number generator developed by Intel [3], which uses thermal noise as an entropy source

Each of these sources itself generates enough entropy to ensure a fair and unpredictable shuffle.
Shuffle Highlights:

* A deck of 52 cards can be shuffled in 52! ways. 52! is about 2^225 (to be precise, 80,658,175,170,943,878,571,660,636,856,404,000,000,000,000,000 ways). We use 249 random bits from both entropy sources (user input and thermal noise) to achieve an even and unpredictable statistical distribution.
* Furthermore, we apply conservative rules to enforce the required degree of randomness; for instance, if user input does not generate required amount of entropy, we do not start the next hand until we obtain the required amount of entropy from Intel RNG.
* We use the SHA-1 cryptographic hash algorithm to mix the entropy gathered from both sources to provide an extra level of security
* We also maintain a SHA-1-based pseudo-random generator to provide even more security and protection from user data attacks
* To convert random bit stream to random numbers within a required range without bias, we use a simple and reliable algorithm. For example, if we need a random number in the range 0-25:
      o we take 5 random bits and convert them to a random number 0-31
      o if this number is greater than 25 we just discard all 5 bits and repeat the process
* This method is not affected by biases related to modulus operation for generation of random numbers that are not 2n, n = 1,2,..
* To perform an actual shuffle, we use another simple and reliable algorithm:
      o first we draw a random card from the original deck (1 of 52) and place it in a new deck - now original deck contains 51 cards and the new deck contains 1 card
      o then we draw another random card from the original deck (1 of 51) and place it on top of the new deck - now original deck contains 50 cards and the new deck contains 2 cards
      o we repeat the process until all cards have moved from the original deck to the new deck
* This algorithm does not suffer from "Bad Distribution Of Shuffles" described in [2]

PokerStars shuffle verified by Cigital and BMM International

PokerStars submitted extensive information about the PokerStars random number generator (RNG) to two independent organizations. We asked these two trusted resources to perform an in-depth analysis of the randomness of the output of the RNG, and its implementation in the shuffling of the cards on PokerStars.

Both independent companies were given full access to the source code and confirmed the randomness and security of our shuffle. Visit Online Poker Random Number Generator for more details.

[2] “How We Learned to Cheat at Online Poker: A Study in Software Security” - http://itmanagement.earthweb.com/entdev/article.php/616221
[3] “The Intel Random Number Generator” - <a href=“http://www.cryptography.com/resources/whitepapers/IntelRNG.pdf”">http://www.cryptography.com/resources/whitepapers/IntelRNG.pdf"

You should see what pokerstars.com writes on its website about its shuffle. For the random number generator, it supposedly uses, among other things, mouse movements of the users!

Greg

I’d probably use a few different “shuffling” techniques back to back – If possible, I’d change up the order in which the different techniques where applied.

It wouldn’t be as efficient as some people’s suggestions, but I think it’d produce a fairly random shuffle; at least as far as users would notice.

For those asking about why the built-in Rand() function and Timer seeds are less than optimal.

Rand() uses the generated Random number as the Seed to the next number. So even if you used the Timer to start the initial seed you can predict every number after the first randomly generated one as it is a sequnece that can be repeated every single time.

So to try to clarify, if the Random number generated using Rand() and a Timer seed was 52, then 52 is used to seed the next number and you will get the same value every time, in fact RandomNumber2 through Random NumberInfinity is now predictable.

All that using the Timer for a seed does is scramble the first possible number.

Rand() is classified as a Linear Congruential function.

A good random number generator has three properties:

  1. It generates evenly distributed numbers
  2. The values are unpredictable
  3. It has a long and complete cycle (it generates a large number of different values and all the values in the cycle can be generated).

Linear congruential functions meet the first property but fail the second property miserably! In other words, rand() produces an even distribution of numbers, but each next number is highly predictable!

Let’s look at the source code to a typical rand function:

unsigned long int next =1;
int rand(void)
{
next = next * 1103515245 + 12345;
return (next/65536) % 32768;
}

DotNet has a much better random number generator called RNGCryptoProvider.

I appreciate the elegance of Jeff’s solution in that it reduces the required code to just several lines. I have used this exact same problem in my programming classes and have seen solutions with 100+ lines of code over the years.

The other advantage to using random numbers and sorting is that it scales very well to very large number sets.

@WaterBreath
"Instead I suggest to just pull and discard numbers from your PRNG"

How, pray tell, do you intend to make your “pull and discard” algorithm non-deterministic?

Perhaps you can use a random timing?
But, oops, that requires a random number, which requires a non-deterministic PRNG… in other words, you need a non-deterministic PRNG to build your non-deterministic PRNG…

When I first saw this example of shuffling, I had no idea about syncing the random seed with another could lead to such a security hazard. But I don’t believe using a GUID is a perfect way to generate a random number, unless you know that it’s algorithm produces an equally distributed random number–and that is determined by the GUID algorithm’s “variant and the version” (http://www.famkruithof.net/guid-uuid-make.html).

So either be careful when choosing the API your GUID is generated from, seed your random generator with a unique number, or randomly sort the GUID’s

[If you don’t know about hash tables, find a tutorial about the various ways you can make them. Very interesting if you enjoy data structures and algorithms]

My idea is to use a hash table to sort the cards into by each cards generated GUID. By using a randomly generated number in the hash tables algorithm, this guarentees evenly distributed shuffling as well as essentially O(n)–which is only interesting when you come across much a much larger amount of cards (i.e. encryption and such).

In this way, if you use an API which is inflexable (doesn’t let you seed your random number generator or choose your GUID algorithm), you can still adapt an equally distributed GUID method.

If you need real random numbers ie your business depends on it. You need
to decaying radioactive material.

Example of that is below:

http://www.fourmilab.ch/hotbits/

http://www.fourmilab.ch/hotbits/hardware3.html

nuff said.