One (very) crude test for randomness is how compressible the data is. I used WinRARās ābestā compression method on 1 megabyte of pseudo-random data, and both times the file ended up larger after compression. Hereās an amusing related anecdote from Larry Osterman:
ent.exe results for 1 million System.Random bytes:
Entropy = 7.999794 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 286.23, and randomly
would exceed this value 10.00 percent of the times.
Arithmetic mean value of data bytes is 127.4788 (127.5 = random).
Monte Carlo value for Pi is 3.138900556 (error 0.09 percent).
Serial correlation coefficient is 0.001206 (totally uncorrelated = 0.0).
ent.exe results for 1 million System.Encryption.RandomNumberGenerator bytes
Entropy = 7.999816 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 254.68, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.4412 (127.5 = random).
Monte Carlo value for Pi is 3.148140593 (error 0.21 percent).
Serial correlation coefficient is 0.000330 (totally uncorrelated = 0.0).
Entropy = 7.999818 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 251.58, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.5342 (127.5 = random).
Monte Carlo value for Pi is 3.143316573 (error 0.05 percent).
Serial correlation coefficient is -0.000293 (totally uncorrelated = 0.0).
So it looks like the System.Encryption.RNG method is significantly closer to true random. Or at least the statistics are in line ā with the exception of the Monte Carlo Pi calculation, which is really off!
Funny thing is, people arenāt that random either. As anyone to pick 5 ārandomā numbers from 1-20, and you get a distribution thatās generally spaced out.
A genuinely random pick has no preference to spacing.
Having a human in the loop is generally worse than using a prng. Thatās why the Rand is such a valuable reference - it removes our own peculuarities for what we ourselves deem as ārandomā numbers.
Even the WASTE method isnāt really random. People arenāt going to randomly move their mouse, there is some sort of pattern to it. Itās just the same idea that asking a person to select a random number is not really random either, because we have a predisposition to select certain numbers.
The very philosophical might even suggest that there is in fact nothing random at all, about anything, anywhere. Itās only that we lack data and/or methods of prediction.
Itās easy for a computer to generate random data. It can collect entropy from the noise of the input channel of the sound card. Perfect, random, thermionic noise - no extra devices required. Sometimes I wonder why not a single soul thought of doing this yet. Of course you canāt generate millions of random numbers per second this way, but a few per second is easily doable Iād say.
Schneier Ferguson note in āPractical Cryptographyā (chapter 10) that:
In the context of a cryptographic system, we have more stringent requirements [than statistical randomness does]. Even if the attacker sees a lot of the random data generated by the PRNG, she should not be able to predict anything about the rest of the output of the PRNG. We call such a PRNG cryptographically strong. [ā¦]
Forget about the normal random function in your programming library, because it is almost certainly not a cryptographic PRNG. Many libraries ship with a PRNG that fails even simple statistical tests. Unless the cryptographic strength is explicitly documented, you should never use a library PRNG.
(And I note that your experiments with the two PRNGs in the .NET library seem to bear this outā¦)
They go onto talk about the difference between PRNGs and true random:
There is a theoretical argument that real random data is better than pseudorandom data from a PRNG. In certain cryptographic protocols you can prove that certain attacks are impossible if you use real random data. The protocol is unconditionally secure. If you use a PRNG, then the protocol is only secure as long as the attacker cannot break the PRNG; the protocol is computationally secure. This distinction, however, is only relevant for people in ivory towers. All cryptographic protocols use computational assumptions for almost everything. Removing the computational assumption for one particular type of attack is an insignificant improvement, which you need for the unconditional security, is so difficult that you are far more likely to reduce the system security by trying to use real random data.
Instead they argue that the real random data generator should be used to seed a PRNG, which seems like good advice to me.
Iām not sure noise from a soundcard is that random.
Wouldnāt noise (if any) from a soundcard be primarily noise from the powergrid, i.e 50 or 60hz ? I guess you could filter that out but not even expensive EMI filters succeed fully in that.
Second, any remaining noise is probably particular to that specific model, so another card would generate much the same characteristics although probably not identical.
I think the problem is if there is enough (random) noise at all.
You canāt test for true randomness. What you can do is test for patterns in the allegedly random data. Even then, itās possible that random data will show up as non-random (see http://en.wikipedia.org/wiki/Infinite_monkey_theorem)
Consider this: Letās say you have a machine that continuously prints random bits generated from a radioactive source onto sheets of paper. Letās also say that you have an encrypted telephone system that needs 256 bits of random data in order to function securely.
If you take 50 pages of output from this machine, select a block of 256 bits from somewhere in these 50 pages, program those bits into your encrypted telephone system, and then incinerate the pages, you can rest assured that your secret telephone conversations will remain confidential.
However, if, instead of incinerating those same 50 pages, you publish them in a special edition of The New York Times, then your secret telephone conversations will be much more vulnerable to eavesdropping.
Notice the two different outcomes: In the first scenario, weād say that the bits are random, but in the second scenario, weād say that they are highly non-random, and this is true even though the bits themselves are exactly the same in both cases.
When youāre generating random bits for use with cryptography, what youāre really concerned about (among other things) is limiting the probability that an attacker will be able to guess your key. The bits themselves are irrelevant.
Yes, real world is the best way to provide true randomness.
In a recent Tech-ED Session about cryptography, Rafal Lukawiecki mentioned particle emission from radioactive material (Uranium, Plutoniom ā¦) as a good random generator method.
He also said that a portable random generator (with Plutonoim inside) would be an extra-cool gadget
The analog world is not reaaallly random either. Trees growing in a forest can be predicted by the distance seeds fall. Clouds in the sky can be predicted by moisture levels. Waves in an ocean by moon phases, currents, and weather. An ant crawling on the ground by a chemical trail. Grains of sand on a beach by nearby sediment and rock types, etc. Wouldnāt all those seem random? So, what really is random?
Intel annouced a hardware random number generator (based on thermal noise) to be released with the Pentium III instruction set but it somehow got lost in the way. I googled a bit but could not find a link to post here and all the refereces are pre release comments.
Another source of entropy could be atmospheric noise from a radio, like that used here at random.org, or even just background noise from an office or laboratory.
Modern soundcards can sample at about 192khz, high-end, 48khz, low-end, with 24 bits a sample. If you took only the low bit of each, you end up with a minimum of 48,000 random bits a second; even if you assume you can use a few more bit per sample without compromising the randomness, and munge them around somehow, youāre far short of your target if you need millions or billions a second for data transmission.
That said, Iāve actually seen software that uses this for low-volume needs, combined with other environmental factors. Most people just donāt want to have to deal with DirectInput or MCI to get to it.
Chris,
any PRNG gets seeded. System.Random takes time (or user-supplied) as the seed, RandomNumberGenerator generates its seed from higher-entropy system parameters and presumably changes seeds by retesting occasionally.
Alistair,
according to MSDN, RandomNumberGenerator is cryptographically secure, as Iād expect any newer languageās PRNG to be now that people actually care about that. Most of the 3rd party C and Fortran enhancements have been used by Python, .Net, and other recent languages. (Javaās built-in sucks balls though: http://alife.co.uk/nonrandom/ )
This is going to only be of any help to any C/C++ coders, but thereās an ANSI-C header available called mt.h (Mersenne Twister) that allows you to generate a better spread of randomness than the default random number generator used with most C compilers.
Back when I used to code for GBA using HAM, I would use mt.h instead of the included random function because it would generate a random number with limited hardware much more efficiently. The results were staggeringly different. Using the built-in generator, I could get a ghost to appear on a field and usually tell you were the ghost would wind up based on when I pressed the key. Using mt.h I could never predict anything, which is precisely what I was looking for.
The very philosophical might even suggest that there is in fact nothing random at all, about anything, anywhere. Itās only that we lack data and/or methods of prediction."
Iāll go with that. The trick isnāt about creating random numbers(iāll challenge you all to define random. i sure as hell canāt, pardon the language). Trick is about creating the hardest possible numbers to predict.
Okay, i might of gotten too philosophical with this, but thatās the way i like it