Computers are Lousy Random Number Generators

codinghorror · November 17, 2006, 12:00am

The .NET framework provides two random number generators. The first is System.Random. But is it really random?

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2006/11/computers-are-lousy-random-number-generators.html

codinghorror · November 19, 2006, 12:00am

One (very) crude test for randomness is how compressible the data is. I used WinRAR’s “best” compression method on 1 megabyte of pseudo-random data, and both times the file ended up larger after compression. Here’s an amusing related anecdote from Larry Osterman:

https://web.archive.org/web/20070120180055/http://blogs.msdn.com/larryosterman/archive/2005/04/19/409723.aspx

codinghorror · November 19, 2006, 12:00am

http://www.fourmilab.ch/random/ provides a nice tool for quantitatively measuring randomness

ent.exe results for 1 million System.Random bytes:

Entropy = 7.999794 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 286.23, and randomly
would exceed this value 10.00 percent of the times.

Arithmetic mean value of data bytes is 127.4788 (127.5 = random).
Monte Carlo value for Pi is 3.138900556 (error 0.09 percent).
Serial correlation coefficient is 0.001206 (totally uncorrelated = 0.0).

ent.exe results for 1 million System.Encryption.RandomNumberGenerator bytes

Entropy = 7.999816 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 254.68, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.4412 (127.5 = random).
Monte Carlo value for Pi is 3.148140593 (error 0.21 percent).
Serial correlation coefficient is 0.000330 (totally uncorrelated = 0.0).

codinghorror · November 19, 2006, 12:00am

ent.exe results for 1 million http://www.random.org bytes:

Entropy = 7.999818 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 251.58, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.5342 (127.5 = random).
Monte Carlo value for Pi is 3.143316573 (error 0.05 percent).
Serial correlation coefficient is -0.000293 (totally uncorrelated = 0.0).

So it looks like the System.Encryption.RNG method is significantly closer to true random. Or at least the statistics are in line – with the exception of the Monte Carlo Pi calculation, which is really off!

Ted_Chen · November 19, 2006, 12:00am

Funny thing is, people aren’t that random either. As anyone to pick 5 ‘random’ numbers from 1-20, and you get a distribution that’s generally spaced out.

A genuinely random pick has no preference to spacing.

Having a human in the loop is generally worse than using a prng. That’s why the Rand is such a valuable reference - it removes our own peculuarities for what we ourselves deem as ‘random’ numbers.

Taylor · November 19, 2006, 12:00am

Even the WASTE method isn’t really random. People aren’t going to randomly move their mouse, there is some sort of pattern to it. It’s just the same idea that asking a person to select a random number is not really random either, because we have a predisposition to select certain numbers.

The very philosophical might even suggest that there is in fact nothing random at all, about anything, anywhere. It’s only that we lack data and/or methods of prediction.

DMB · November 20, 2006, 12:00am

It’s easy for a computer to generate random data. It can collect entropy from the noise of the input channel of the sound card. Perfect, random, thermionic noise - no extra devices required. Sometimes I wonder why not a single soul thought of doing this yet. Of course you can’t generate millions of random numbers per second this way, but a few per second is easily doable I’d say.

AlastairR · November 20, 2006, 12:00am

Schneier Ferguson note in “Practical Cryptography” (chapter 10) that:

In the context of a cryptographic system, we have more stringent requirements [than statistical randomness does]. Even if the attacker sees a lot of the random data generated by the PRNG, she should not be able to predict anything about the rest of the output of the PRNG. We call such a PRNG cryptographically strong. […]

Forget about the normal random function in your programming library, because it is almost certainly not a cryptographic PRNG. Many libraries ship with a PRNG that fails even simple statistical tests. Unless the cryptographic strength is explicitly documented, you should never use a library PRNG.

(And I note that your experiments with the two PRNGs in the .NET library seem to bear this out…)

They go onto talk about the difference between PRNGs and true random:

There is a theoretical argument that real random data is better than pseudorandom data from a PRNG. In certain cryptographic protocols you can prove that certain attacks are impossible if you use real random data. The protocol is unconditionally secure. If you use a PRNG, then the protocol is only secure as long as the attacker cannot break the PRNG; the protocol is computationally secure. This distinction, however, is only relevant for people in ivory towers. All cryptographic protocols use computational assumptions for almost everything. Removing the computational assumption for one particular type of attack is an insignificant improvement, which you need for the unconditional security, is so difficult that you are far more likely to reduce the system security by trying to use real random data.

Instead they argue that the real random data generator should be used to seed a PRNG, which seems like good advice to me.

GH111 · November 20, 2006, 12:00am

I’m not sure noise from a soundcard is that random.

Wouldn’t noise (if any) from a soundcard be primarily noise from the powergrid, i.e 50 or 60hz ? I guess you could filter that out but not even expensive EMI filters succeed fully in that.

Second, any remaining noise is probably particular to that specific model, so another card would generate much the same characteristics although probably not identical.

I think the problem is if there is enough (random) noise at all.

It could be one source but not the single source.

Ian_Johns · November 20, 2006, 12:00am

I found this related article regarding random number algorithms for online poker interesting since I used to play at PlanetPoker :

http://itmanagement.earthweb.com/entdev/article.php/616221

I now play at PokerStars they claim their open algorithm is correct provides sufficient randomness – I certainly hope so :

DwayneL · November 20, 2006, 12:00am

You can’t test for true randomness. What you can do is test for patterns in the allegedly random data. Even then, it’s possible that random data will show up as non-random (see http://en.wikipedia.org/wiki/Infinite_monkey_theorem)

Consider this: Let’s say you have a machine that continuously prints random bits generated from a radioactive source onto sheets of paper. Let’s also say that you have an encrypted telephone system that needs 256 bits of random data in order to function securely.

If you take 50 pages of output from this machine, select a block of 256 bits from somewhere in these 50 pages, program those bits into your encrypted telephone system, and then incinerate the pages, you can rest assured that your secret telephone conversations will remain confidential.

However, if, instead of incinerating those same 50 pages, you publish them in a special edition of The New York Times, then your secret telephone conversations will be much more vulnerable to eavesdropping.

Notice the two different outcomes: In the first scenario, we’d say that the bits are random, but in the second scenario, we’d say that they are highly non-random, and this is true even though the bits themselves are exactly the same in both cases.

When you’re generating random bits for use with cryptography, what you’re really concerned about (among other things) is limiting the probability that an attacker will be able to guess your key. The bits themselves are irrelevant.

FilippoG · November 20, 2006, 12:00am

Yes, real world is the best way to provide true randomness.
In a recent Tech-ED Session about cryptography, Rafal Lukawiecki mentioned particle emission from radioactive material (Uranium, Plutoniom …) as a good random generator method.
He also said that a portable random generator (with Plutonoim inside) would be an extra-cool gadget

erwrew · November 20, 2006, 12:00am

The analog world is not reaaallly random either. Trees growing in a forest can be predicted by the distance seeds fall. Clouds in the sky can be predicted by moisture levels. Waves in an ocean by moon phases, currents, and weather. An ant crawling on the ground by a chemical trail. Grains of sand on a beach by nearby sediment and rock types, etc. Wouldn’t all those seem random? So, what really is random?

JSD · November 20, 2006, 12:00am

Intel annouced a hardware random number generator (based on thermal noise) to be released with the Pentium III instruction set but it somehow got lost in the way. I googled a bit but could not find a link to post here and all the refereces are pre release comments.

Foxyshadis · November 20, 2006, 12:00am

DMB,
you might want to read a little harder:

Another source of entropy could be atmospheric noise from a radio, like that used here at random.org, or even just background noise from an office or laboratory.

Modern soundcards can sample at about 192khz, high-end, 48khz, low-end, with 24 bits a sample. If you took only the low bit of each, you end up with a minimum of 48,000 random bits a second; even if you assume you can use a few more bit per sample without compromising the randomness, and munge them around somehow, you’re far short of your target if you need millions or billions a second for data transmission.

That said, I’ve actually seen software that uses this for low-volume needs, combined with other environmental factors. Most people just don’t want to have to deal with DirectInput or MCI to get to it.

Chris,
any PRNG gets seeded. System.Random takes time (or user-supplied) as the seed, RandomNumberGenerator generates its seed from higher-entropy system parameters and presumably changes seeds by retesting occasionally.

Alistair,
according to MSDN, RandomNumberGenerator is cryptographically secure, as I’d expect any newer language’s PRNG to be now that people actually care about that. Most of the 3rd party C and Fortran enhancements have been used by Python, .Net, and other recent languages. (Java’s built-in sucks balls though: http://alife.co.uk/nonrandom/ )

codinghorror · November 20, 2006, 12:00am

Well, if the analog world is so not-random, why aren’t our weather forecasts 100% accurate by now? I think it’s chaos theory at work..

Foxyshadis · November 20, 2006, 12:00am

And just to show some pedigree on audio randomness:
http://www.cs.berkeley.edu/~daw/rnd/audio-entropyd ('99)
http://cypherpunks.venona.com/date/1992/11/msg00011.html ('92)

I’m sure there’s older, but I didn’t check.

Daniel · November 20, 2006, 12:00am

I remember in high school we had a maths reference book that included a page of “random numbers” that bore a strong resemblence to digits of pi.

Jae · November 20, 2006, 12:00am

This is going to only be of any help to any C/C++ coders, but there’s an ANSI-C header available called mt.h (Mersenne Twister) that allows you to generate a better spread of randomness than the default random number generator used with most C compilers.

Back when I used to code for GBA using HAM, I would use mt.h instead of the included random function because it would generate a random number with limited hardware much more efficiently. The results were staggeringly different. Using the built-in generator, I could get a ghost to appear on a field and usually tell you were the ghost would wind up based on when I pressed the key. Using mt.h I could never predict anything, which is precisely what I was looking for.

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

Juha · November 20, 2006, 12:00am

The very philosophical might even suggest that there is in fact nothing random at all, about anything, anywhere. It’s only that we lack data and/or methods of prediction."

I’ll go with that. The trick isn’t about creating random numbers(i’ll challenge you all to define random. i sure as hell can’t, pardon the language). Trick is about creating the hardest possible numbers to predict.

Okay, i might of gotten too philosophical with this, but that’s the way i like it