Shuffling

EricL · December 6, 2007, 12:00am

My apologies, indeed, I conflated Knuth’s algorithm (which has no bias) with the algorithm on Mike’s blog (which does have bias). Knuth’s algorithm has n! possible paths, each to a distinct output, each equally likely. Julian’s algorithm has n^n possible paths to n! distinct outputs, and since n! does not usually divide n^n evenly, some of those have to be overrepresented.

vince3 · December 6, 2007, 12:00am

All the talk about random number generators aside, I think we’ve lost sight of something here: How physical shuffling is done. The “perfect” shuffle algorithm actually shuffles like a casino dealer would, not in some pseudo perfect mathematical sense. Standard gaming rules dictate 7 shuffles in a standard 2 halves fanned back into 1 style with a cut at the end. This means that any card starting at the dead bottom of the deck has no way of making it all the way to the top of the deck, and vice versa prior to the cut. In short, the shuffle is not a true random reordering by any stretch.

Anyway, here’s my take on how I’d do this:

Start with 52 cards in an array.
Split the deck at a “random” half somewhere between the 1/3 and 2/3 marks with weighting towards the middle.
Walk the index alternating between your start position and your half position reading 1-5 cards at a time at random (see 3a)
3a) The random algorithm used would need to have 2 weights included. first, a base weight towards 2 or 3 for realism. Second, a calculated weight based on the half of the decks relative remaining size as compared to the other halfs remaining size. This “corrects” instances of one side of the deck running out of cards significantly faster than the other, as any physical shuffler would also do.
Repeat steps 2-3 for a total of 7 iterations
Allow the player a way to “cut” the deck, or calculate a "random cut anywhere in the deck, weighted towards center.

Not in code because, honestly, I don’t have time right now. Maybe I’ll do it on the quick and dirty via Ruby. I think this is closer to reality of a shuffle, though.

Cardc · December 6, 2007, 12:00am

Sorry to ask a silly question in the midst of such heady discussion, but I just can’t figure out the snippet:

var cards = Enumerable.Range(0, 51);
var shuffledcards = cards.OrderBy(a = Guid.NewGuid());

What the heck is a?
Where is/are the other params to OrderBy (docs say this method needs at least 2)
In the lambda expression, how can a be a param to NewGuid which doesn’t take any?

WaterBreath · December 6, 2007, 12:00am

Re: The first problem

Several people have mentioned using implicit user input data as the actual source of random numbers (mouse coordiantes, key press distributions, etc). I think that in many cases, however, this is still vulnerable to reverse-engineering by a resourceful hacker. Plus your range of possible values is limited by the granularity of the implicit user input data.

Instead I suggest to just pull and discard numbers from your PRNG in a core loop somewhere in your code. Maybe a message-processing loop, or when waiting on user input. Just make sure it’s somewhere that gets run on a fairly constant basis anytime the app is running. This is a simple way to ensure there are no “synch points” in your app that a hacker can count on to line up their PRNG with your app’s. Because you’re constantly generating numbers and there’s no way to know how many will be generated between the last time you actually used one and the next time you will actually use one.

Mike · December 6, 2007, 12:00am

Re: Bias
At first I thought your algorithm had a bias because (to paraphrse Peter) you’re choosing a card to swap with in the range [1…X] instead of using the the range [1…52]

wikipedia on Knuth’s Shuffle (http://en.wikipedia.org/wiki/Knuth_shuffle) gives a good description as to why this isn’t the case. In brief, swaping with a card in the range [1…52] gives 52^52 possible swap sequences, but there’s only 52! possible decks. 52^52 52! so there are multiple sequences leading to a particular deck. 52^52 is not divisible by 52! so some decks have more sequences leading to them then others.

Thanks for a great article!

Craig · December 6, 2007, 12:00am

Catto must be sick, how long has it been since he missed a post?

James · December 7, 2007, 12:00am

Thinking about my comment I made before, and using a complicated hash table (which is very interesting to learn) to sort your data into, and then take it out in sequential order is a bad way to do it. Of course it was the only thing I could think of on my 15 minute break at work.

Using your GUID idea, just adding the randomly generated number to the GUID would guarentee a “good enough” random dispersion with a unique number.

#If random number is a long integer Then
randomIndexOfCard = random() + (getGUID() mod [larges random number possible]);
#Else
randomIndexOfCard = random() * sizeof(long int) + getGUID();
#Endif

Loop until each card indexed and sort the cards as a shuffle.

Then all you have to do is shuffle, rince, and repeat! (7 times sounded like a good number :)…

Olivierd2 · December 7, 2007, 12:00am

Eh Jeff, you disapoint me!
Do you really need to show algo-complexity when N=52? Let asume you do not do a N^8 algo, and it OK with that.

Second point is: Random() sucks for cryptography but it is enough for human’s game. Assuming no one can know the initial generator number, rand.Next() is ok.

Tell me if i wrong, but i belive if i give you this algo:
Random rand = new Random(x);
while(true)
{
Console.WriteLine(rand.Next(100)%2 == 0);
}

And i give you the 50000 first results:
true
false
false
false
false
true
true
…
Do you will be able to retrieve the “x” number?
I think the answer is no.

In this case the only problems for a casino is to find a good “x” that now one can predict. I have no idea of how to.

It seem to me that all of your sort carts algo are OK. So the best one is the easier to understand.

Carra · December 8, 2007, 12:00am

Mmm, Java has a shuffle functin, I had no idea.

Wrote my own shuffle mechanism which is exactly the same as the one posted…

I guess it does prove that it’s easy Although I was pleased that it went in O(n).

Waterbreath1 · December 8, 2007, 12:00am

How, pray tell, do you intend to make your “pull and discard” algorithm non-deterministic?

Maybe I am being naive here… But if so could someone explain what about my proposal is any less deterministic than what others have proposed?

How could someone predict how many numbers will be discarded unused while, for example, the app is waiting for the user to click a button? The actual numbers used for the randomness-critical app features will not be consecutive numbers generated by the PRNG. How far we jump forward in the numbers we actually use depends on how many cycles of processor time are actually spent in that loop, discarding numbers. I’m sure this isn’t completely unobtainable info, but I don’t think it is any easier than, say, hooking into the mouse location data.

I’d guess this has a similar effect to reseeding your generator often. But, not knowing a whole heck of a lot about the math behind PRNGs… maybe you’re saying this somehow makes the numbers less random?

WaterBreath · December 8, 2007, 12:00am

Perhaps you can use a random timing?
But, oops, that requires a random number, which requires a non-deterministic PRNG… in other words, you need a non-deterministic PRNG to build your non-deterministic PRNG…

If I’d read the comment a little more carefully, I would have quoted his question in my last comment, because that is what I am answering.

Again, I’m not saying it’s perfect. It can be hacked. I’m just saying that it’s no worse than other people’s more complex solutions.

Steve · December 9, 2007, 12:00am

For all you old-timers out there, heres some code you can run on your VIC 20 to shuffle those cards…

…
5 cntr%=1
7 REM ** cntr% holds the order in which cards are drawn
10 DIM a(52)
13 REM ** 52 cards in a standard deck
15 REM ** initialise them to 0
17 FOR c=1 TO 52:a©=0: NEXT c
19 REM ** pick a card
20 i%= INT( RND(1)*52)+1
25 REM ** look to see if this item can be inserted
30 IF a(i%) 0 GOTO 20
35 a(i%)=cntr%:cntr%=cntr%+1
37 REM ** if we haven’t picked 52 cards, go again
40 IF cntr% 53 GOTO 20
45 REM **
47 REM ** display results
49 REM **
50 FOR z=1 TO 52: PRINT a(z): NEXT z
…

Not so much Try and catch, more like pray and hope ;o)

Andrew_R · December 9, 2007, 12:00am

@Waterbreath

… could someone explain what about my proposal is any less deterministic than what others have proposed?

Determinism is all I was pointing out. Anything that is deterministic makes up no part of the randomness in the result.
eg. Giving a PRNG it’s own output as a new input seed does nothing for the randomness (though some generators still do this for security reasons). Ie. anything that you know to be deterministic may as well be removed.

How could someone predict how many numbers will be discarded unused while, for example, the app is waiting for the user to click a button?

Phrase the question slightly differently: where is the true source of randomness?
If you said “the timing of the mouse click”, you’d be right. Thus, it is the only thing an attacker needs to control, eg. by stuffing a mouse-click event into the queue by some means, and voila, complete compromise; the attacker knows the exact state of the PRNG. Bugger.

Conclusion: pulling numbers from the PRNG did nothing to increase the randomness, you were actually relying solely on the mouse click for all your entropy.
In other words: you may as well not bother with pulling numbers because it’s deterministic. The mouse-click was good, its (one of) your source(s) of randomness; when it occurs you can roll the timing into the PRNGs input. It will contribute whatever tiny entropy it might have (possibly none if the attacker has corrupted it - but that shouldn’t matter because you have many other sources… right?).

I’d guess this has a similar effect to reseeding your generator often…

Ah, I see the problem. You are accustomed to the cute little things referred to as “RNGs” in many standard libraries; you seed them once and then use them forever. Those are a poor shadow of a PRNG - you basically just shouldn’t ever use them. They are, often, trivially easy to brute-force attack.

A real PRNG has inputs as well as an output. “seeding” as you know it, should be occurring frequently in an unending stream, never ceasing, from many sources.

So… what are the sources?

Anything.

If you aren’t certain of the value of a source (and you shouldn’t be, because who can accurately judge the entropy in a mouse-click?), stuff it in anyway. A real PRNG is unaffected by a compromised entropy source because its output is the ENTROPY SUM of it’s inputs. Thus, if one source is compromised (or simply deterministic from the outset), it is merely a source with zero entropy; it is not detrimental in any other way (so long as the remaining sources sum to the level of entropy required).

TomR · December 11, 2007, 12:00am

“The standard way of implementing this algorithm is: associate each card with a random real number between 0.0 and 1.0. Sort the list based on its associated number. That’s O(n log n) and has no bias.”

Couldn’t you use integers instead of real numbers, then use a non-comparison sort like bucket/radix sort?

JavaGuy147 · May 12, 2008, 12:00am

this is my way of getting a random number (please don’t criticize the memory access part, simply ACCESSING memory is safe)

When the app starts, itll do a simple srand (i use C/C++) with the parameter of the the time in milliseconds divided by how long the app has been running. it then mallocs an array (no calloc, thatll make everything 0) the length of the random number. It then finds out how much time has passed since the last time we checked the apps run-time. the difference is modulated by a number produced by a call to rand() untill it is a number small enough to fit in the allocated memory. Then access the that entity of the array and you will get a random number from bytes that arent even part of your programs. If its 0, the program retries with a dif rand number until a successful number is obtained

seems long and complicated, but doesnt use any long or system specific procedures and is quite fast since all are functions are part of the language and simple math/memory operations

J1127 · June 22, 2008, 12:00am

They could collect a seed like truecrypt does with mouse moves and/or some mandelbrot point.

Pi1 · August 24, 2008, 12:00am

I’d just do the calculation for each card as required.

Less tampering possible I’d think

RRyyaann · September 3, 2008, 12:00am

This seems to do the trick very well for me, ported from an old ass javascript shuffle prototype I made a loooooong time ago.

One thing I didn’t understand with it though, is if I used
r.Next(0, m_cards.Count - 1) which would have been the true final index, it stuck itself in an infinite loop, oh wellz. works now =)

    public void Shuffle() 
    {
        Random r = new Random();

        Listint used = new Listint();
        ListCard temp = new ListCard();
        do
        {
            int index = r.Next(0, m_cards.Count);

            while (used.IndexOf(index)  -1)
                index = r.Next(0, m_cards.Count);

            used.Add(index);
            temp.Add(m_cards[index]);
        }
        while (temp.Count  m_cards.Count);

        m_cards.Clear(); // not necessary
        m_cards = temp;
    }

RRyyaann · September 3, 2008, 12:00am

why did my msg get removed x.x

RRyyaann · September 3, 2008, 12:00am

okay and now it’s not. what the heck.