And yet your “examples” say…
dictionary file containing about 26 million words, combined with programming rules that greatly extend its effectiveness by adding numbers, punctuation, and other characters
It combines each word in a dictionary with every other word in the dictionary
That’s a total of two words combined and some mutations, absolutely not
polling all books in existence and many websites and forums for phrases including combinations of nonsensical words
Combining two words and mutating seems reasonable, let’s say most people have 20k words they regularly use, 20k times 20k is 400 million, but a damn far cry from “every phrase in every book ever written”, like you said.
Now this Ars article… is very interesting. It is what you described. In fact, here’s a long word list that guy cracked and it is fascinating reading. There is some potential there, but realize this guy is custom generating 1TB worth of phrases every time based on the target:
Plucking long word groupings out of books and articles and turning them into working cracking dictionaries is no trivial undertaking. For one thing, it requires huge amounts of disk space. Dustin works around the challenge mostly by filling up his 1TB hard drive with a list, using it to generate guesses against his uncracked hashes, wiping the drive clean, and starting all over with a new list of phrases.
One of the highlighted Ars commenters at the bottom of that article answered your question, too:
So is “correct horse battery staple” still the right type of password to use? It doesn’t seem like just stringing together a few dictionary words is sufficient any more. Surely putting together random, but common, dictionary words is in the cracker’s arsenal as well.
There are ~750,000 words in the english language. Even without substitutions, capitalizations, or weird spacing, that represents about 10^23 combinations if you picked 4 at random. You could test a billion combinations a second and finish sometime in the next 4 million years. But you said common words…
Average adult vocabulary is 20,000-35,000 words. Let’s assume that people who voluntarily test their vocabulary are probably on the high end of the bell curve in terms of word usage, and cut that low number in half. That leaves us with 10,000 words, and 10^16 ways to combine them (again if we picked just 4 at random to make our random passphrase). Generating a million hashes per second (pretty damn fast), it would take our cracker about 120 days to go through the combinations, and consume 284PB if he decides to store it as a lookup table. And that’s just from choosing 4 random commonly used words. If you went to 5, or did decided to capitalize the first and last letters, or the first letter of every word, or put a random space in there, or included a “word” made up from the first letters of all the other words (i.e., “correct horse battery staple chbs”)…well the numbers get astronomical very quickly.
The commenter was lowballing a hell of a lot on that hashes per second figure, though. Per the GRC haystack page:
- Offline fast attack: 100 billion guesses per second
- Nation state: 100 trillion guesses per second
One million guesses per second is pretty… quaint by today’s standards.
1,000,000 one million
1,000,000,000 one billion
So what he calls 120 days at that one million hashes/sec rate, let’s reduce by 1000 to 2.88 days, that seems pretty realistic on today’s hardware.
Also, consider the weight of number of hashes (passwords) you have. It seems reasonable that you’d beat about 50% of them with short passwords alone (assuming 8 char average password, which is nothing these days), common wordlists, common mutations, and a little brute force.
But those are small lists. You would go from:
- each password has a few billion possible words + mutations
- each password has hundreds of trillions of words, phrases, and mutations
So even if you had that 1TB hard drive full of custom phrases derived from… books? magazines? movies? TV? what’s the target again? you have expanded the effort of work by many many orders of magnitude.