Speed Hashing

codinghorror · April 6, 2012, 12:00am

Hashes are a bit like fingerprints for data.

A given hash uniquely represents a file, or any arbitrary collection of data. At least in theory. This is a 128-bit MD5 hash you're looking at above, so it can represent at most 2¹²⁸ unique items, or 340 trillion trillion trillion. In reality the usable space is substantially less; you can start seeing significant collisions once you've filled ~~half~~ the square root of the space, but the square root of an impossibly large number is still impossibly large.

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2012/04/speed-hashing.html

MarioF · April 6, 2012, 12:00am

And don’t forget scrypt (http://www.tarsnap.com/scrypt.html) which does tend to offer a bit more protection against brute force attacks.

adam1 · April 6, 2012, 12:00am

If you are a user:

Make sure all your passwords are 12 characters or more, ideally a lot more.

What’s one to do when a bank/insurance company/government website’s server-enforced password policy is “6-8 numbers or letters, no special characters”?

Stolen_Data · April 6, 2012, 12:00am

“Take a look at what I’m wearing, people. You think anybody wants a roundhouse kick to the face while I’m wearing these bad boys? FUGGEDABOUDIT!”

codinghorror · April 6, 2012, 12:00am

Cry.

But seriously: switch bank/insurance/government agencies if you can. Also try accented characters in UTF-8 to increase the entropy, if pôssïblé?

MirceaC · April 6, 2012, 12:00am

@Jeff, I’d rather use two factor. Wait… I already do. Google uses it. My bank uses it. I use Google to sign in with almost everything (OpenID FTW!).

CB1 · April 6, 2012, 12:00am

While passphrases may well be better than passwords, it strikes me as folly to rely on memory at all. Only the most conscientious are going to maintain the kind of discipline that approach requires, given the dozens of logins most of us have.

I can’t see anything for it right now other than to use software assistance of some sort. I use 1Password right now, but have used other systems. I have no idea what most of my passwords are, just that they’re usually 15ch or more.

Pies · April 6, 2012, 12:00am

So what you’re saying is that people need to use good, long passphrases to be secure? Gosh

One more tip I’d add is to hash the username together with salt and password. Obviously if you save the username together with hashes and the attacker knows your algorithms it doesn’t make you much more secure, but either way it slows down some types of attacks.

WilliamF · April 6, 2012, 12:00am

Hashing isn’t just used for security. Everything you’ve said here is really only for cryptographic hash functions (SSL, password storage, document authentication, etc.).

It is certainly acceptable to roll your own generic hash functions, as used in what is probably the most popular data structure, hash tables. I will almost always know more about my data set than a generic hash function designer (e.g. java.lang.String.hashCode).

For this usage, security isn’t a concern, only speed and reducing the possibility of collision within my data set.

http://en.wikipedia.org/wiki/Cryptographic_hash_function

AnthonyF · April 6, 2012, 12:00am

Very good post! I wrote almost the same post on my blog last year: http://blog.ircmaxell.com/2011/08/rainbow-table-is-dead.html

One point though. Cryptographic hashes are supposed to be fast, not slow. The faster the hash, the better in that it is faster to verify the message. Otherwise, we’d be using iterated hash functions for signing data, which we’re not. The primary defense against brute forcing is not the speed, but the shear size of the output space (which is why most modern hash algorithms are 256 bit or better). The speed of the function is seen as an advantage because it lets you quickly tell the validity of the message.

Now, for password hashing, this speed is a negative since it’s very fast to brute force an entire search space (relative to documents). And that’s why iterated password hashing algorithms (such as scrypt, bcrypt and pbkdf2) exist. To make the fast hashing function slower for its uses…

Otherwise, very well said…

Anthony

TiniusA · April 6, 2012, 12:00am

If you happened to use www.thepasswordproject.com as background for this article, I’d appreciate a link If not, there is some hashcat stuff there could be interesting to your readers.

SaltwaterC · April 6, 2012, 12:00am

bcrypt is anything but new. In fact, it is one of the old kids around the block. However, people need to avoid using the broken implementation, revealed by CVE-2011-2483.

Sp0rkbomb · April 6, 2012, 12:00am

Good post. However, the part about salted hashing is somewhat misleading. Salted hashes (in this form of usage, not in all) are there to slow down hashing, hiding the salt is (as you explain) pointless. This is also why shadow(3) stores hashes in the exact same line as the hash: It doesn’t matter.

To explain this a bit more thoroughly: Salts must be chosen on a per-password basis in order to provide any hardening against brute forcing. They then provide the additional security that if two of your users (let’s call them ‘steve’ and ‘john’) have the same password ‘password’ and you use their usernames as salt, their two password hashes will still be different, because H(‘stevepassword’) and H(‘johnpassword’) are completely different (provided you’re using a cryptographic hash).
With unsalted hashes (or when the salt is the same for the whole database), as soon as anyone knows the hash of ‘password’, they can also check through all other rows if they have the same hash.
With salted hashes, you have to hash your entire character set against any different salt and check it against the corresponding row.

ChipM · April 6, 2012, 12:00am

It seems like focussing on the speed of cracking a password misses a more important point: in real work systems we can control the number or speed of logins. The way that’s typically implemented is to only allow N wrong logins and then logging the user out. A better alternative in my opinion is simply to double the time between logins, making brute force attacks much slower.

Of course, that creates the possibility of a DOS attack to prevent a particular user from logging in (presuming you know the userid). There probably are good application-specific solutions there (e.g. registering the account to a specific IP or IP block, with a registration process if static IPs can’t be guaranteed) and nothing beats having a good sys admin with proper network monitoring tools.

codinghorror · April 6, 2012, 12:00am

@ChipM that is a totally different issue. All these attacks assume the database has been compromised and released into the wild. This happens all the time, it’s quite common.

http://www.codinghorror.com/blog/2010/12/the-dirty-truth-about-web-passwords.html

(or an internal employee decides to hack the CEO’s password…)

As for attacking websites, even with rate limited logins (of course a must) a basic dictionary approach will work wonders, see:

I don’t want to draw any conclusions from such a small sample size, but at a minimum we can see that it would have been possible to crack more than a third of the user accounts in both leaks using five easy-to-get dictionaries.

Dictionary Attacks 101 describes a real life Twitter attack that worked this way.

Fernandoacorrei · April 6, 2012, 12:00am

Great tips about the server side. On the user side, I recommend open source password keeping software such as KeePass and Password Safe. I’ve been using Password Safe for years to generate long, secure passwords I don’t need to remember.

MatthewH · April 6, 2012, 12:00am

“A given hash uniquely represents a file, or any arbitrary collection of data. At least in theory.”

No. No. No

A given hash does not uniquely represent a file, or any arbitrary collection of data in theory, but if well designed it can in practice.

You actually understand this (because the rest of your article explains just why there are problems if the hash function is no longer deemed secure) but utterly misstate the problem in your first line!

The whole point of reasonable output sized crypto secure hashes for identity is that the probabilities of collision outweigh the fact that collisions are, in the infinite sense, inevitable.

Sofa420 · April 6, 2012, 12:00am

I usually use this function for password generation.

var generatePassword = function(username, password){
    var passwordData = {}
    passwordData.salt = hash(hash(username) + (new Date()).valueOf())
    passwordData.password = hash(passwordData.salt + hash(password))
    return passwordData
}

Yes, 4 hashes may be overdoing it. But this way even if they find a collision for the hashed password, chances that they’ll actually find the plaintext version are slim.

SanderD · April 6, 2012, 12:00am

While I was skeptical about your twitter post earlier, you’ve got some valid points. I’ve done some MD5 GPU experiments before but never managed to get quite the throughput you’ve got here.

Another slightly related point is that, apart from the maximum or charset requirements some sites put (minimum, okay, maximum, say what?!)…
I’d like to vent my utter frustration about so called “security questions”. I’m shouting my cats name every night. Secure. So… very… secure.

And if you’ll excuse me now, I’ll try to remember my new 8-word passphrase. Thanks for the nice article!

Eitan_Adler · April 6, 2012, 12:00am

Use scrypt if you have a choice, not bcrypt or PBKDF2: http://www.tarsnap.com/scrypt.html