Speed Hashing

Good post but talking about hash functions being broken when your authentication system is a password is basically rearranging the deckchairs on the titanic.

Passwords are broken much, much worse than MD5 is broken.

Anybody with data to protect that thinks they can’t tolerate the compromise of a password, probably shouldn’t be using a password.

Any user relying on a password should assume it will be compromised and be ready to change it.

All this commentary on the weakness of MD-5 and SHA-1 and no mention of NIST’s competition for a new cryptographic has algorithm?

http://csrc.nist.gov/groups/ST/hash/

“In reality the usable space is substantially less; you can start seeing significant collisions once you’ve filled half the space, but half of an impossibly large number is still impossibly large.”

Actually, thanks to the birthday paradox, you can expect to see a collision after using the square root of the number of valid identifiers, and the page you linked to even says as much. You’ve made a common mistake: 2^64 is not half of 2^128.

There’s one easy way to get people to use long passwords. Tell them to figure out an easy phrase to remember and add it to the end of their password every time. Done.

That way, “fred” (a terrible password) becomes “fredthehungergames” (actually a pretty good password). Same at every site. Increasing the length adds security far more than increasing the complexity of each character.

Hi Jeff,

You’ve shared a lot of valuable information here, especially in regards to the password cracking part! However, I find the subheading “Hashes are designed to be slow” a bit too broad. If that was true, how would you explain the popularity of MurmurHash, Jenkins and FNV1, all of which are designed for speed? Your article would be more correct if you specify “Secure hashes” or “Cryptographic hashes” here (and a few other places) instead. Otherwise, interesting article as usual.

There is a security tradeoff with tunable workfactor based hashing systems (bcrypt and co) - it is really quite easy to launch a DoS against such a scheme since by design they consume system resources. Sure, that’s not the same issue as a data breach, but if 2011 is any indication DoS is popular in a big way again among a certain particular populace.

As for salting - it has an exponential effect on the work to crack regardless of whether the salt is secret or not. with a salt you have to brute force passwords per user, since for a given user it is necessary to go through + user specific salt for every user specific salt.

All this is some pretty good advice, but a couple of points to tweak (since I’m an application security professional):

  • Terminology nit: hash functions produce digests. It doesn't matter most of the time, but when talking about both algorithms and what they produce, it's helpful to use the formal term -- hash for algorithm/function and digest for its output.
  • If you use a good enough hash algorithm (SHA2, Bcrypt, etc.) and a long enough random salt per-user, you are still in good shape against rainbow-table attacks.
  • Adding salt effectively increases the password length for the purposes of hash-table attacks; so an 8-char password with a 10-char salt is effectively 18 chars. Note that this does not substitute for long passwords for any other type of guessing attack.
  • The best thing you can do in your password policy is require long passwords. Complexity requirements only matter if the system is limited to short passwords for some reason. Increasing length increases the complexity of a brute-force attack exponentially, while complexity rules increase it linearly.

The short version: as a developer, the best things you can do are allow/encourage long passwords, choose a good hash algorithm, and use long and random salts per-user. (And you’re right, you don’t need to hide the salts – that adds a lot of complexity for essentially no security value).

Corrections:

  • Random collisions for MD5 start to become frequent when you have around 2^64 digests, which is the square root of the cardinality of MD5's output space, not 2^127, which is half the cardinality of MD5's output space. To provide k bits of security against random collisions, you need a cryptographic hash function providing 2k bits of output, not k+1 bits of output.
  • Cryptographic hashes functions are usually designed to be as fast as possible across a wide variety of currently-in-use hardware, including servers, desktops/laptops, handheld devices, and embedded chips. A password hashing function has the unusual requirement that it should be very slow to compute in addition to it being a good cryptographic hash function. However, most other uses of cryptographic hash functions - such as (1) HTTPS (2) the ETag header in HTTP responses (3) the session-cookie signing in every Rails application - require an extremely fast function and have no use for a slow hash function.

I’d love to use pass phrases…

Except my bank only allows a maximum of 16 characters for a password.

And even better, they require security questions, which I can’t just fill with garbage; every time I log in, I need to answer a random security question. So I need to either keep the security question answers honest so I can remember them or write them down somewhere.

We’re doing something wrong. The fact there is this battle between better encryption and better cracking is just a never-ending arms race.

I am not sure the solution, but I think it is something along the lines of making the user database useless outside of the realm it is being used. Maybe have the entire database hashed against the administrator log-in and that information is never written to disk, just cached in RAM?

Tying it to the hardware somehow? I don’t know what it is but surely there is a better solution.

Jeff, you really should correct a few factual points here:

  • This whole post is about secure hashes. There’s such a thing as an (intentionally) insecure hash function, e.g. cityhash, spookyhash, or murmurhash.

In your nomenclature, all hashes are secure, and all insecure hash functions are checksums. This doesn’t match the way the words are commonly used, as you can observe by the fact that these three insecure hashes all have “hash” in their name.

  • Second, as others have pointed out, secure hashes are not designed to be slow. Some are. Most are not.

In general, I want SHA1 to be fast, because I use it for lots of things other than hiding passwords. If I have a lot of documents to hash, it’s great if I can implement my hash function on the GPU! Indeed, if you look at the SHA3 candidates, you’ll see that they’re explicitly competing on speed.

If you could mimic another person’s fingerprint or DNA at will, you could do some seriously evil stuff. MD5 is clearly compromised, and SHA-1 is not looking too great these days.

Collision attacks against MD5/SHA1 are nothing like mimicking another person’s DNA; they are like the ability to create twins who share the same DNA. As you can guess from the lack of frauds committed by evil twins, that ability is not very dangerous. The same is true in cryptography: collision attacks can be problematic in some cases (such as when a hash is used to digitally sign a message), but completely irrelevant to cracking passwords.

As for figuring out the plaintext which hashes to a certain string, that is called a first preimage attack, and no such attack is known against either SHA1 or MD5 (or even MD4). So SHA1/MD5 is just as secure for password hashing as SHA-256 or any other fast cryptographic hash. (Which is to say, not very secure - as the article correctly explains.)

Use bcrypt or PBKDF2 exclusively to hash anything you need to be secure. These new hashes were specifically designed to be difficult to implement on GPUs.

I don’t think PBKDF2 was specifically designed to be GPU-unfriendly - it was just designed to be slow.

Thanks. At first I thought your post was just a bunch of facts any professional programmer should have learned in college, but I had no idea how fast some of these hashes can be cracked nowadays.
Cool stuff

I agree with Jeroen Jacobs, there is the whole other side of hashing for speed and minimal collisions.

Jeff, you might be interested in a paper I just published on using hash digests for both message and content identity. We make it possible to figure out if two files, even if they might not be the same format, contain the same content: http://tw.rpi.edu/web/doc/parallelIdentitiesOGD

The early Unix password scheme involved running the hash iteratively - 1000 times IIRC. How effective is this in strengthening hashes, these days? Does running SHA256 on a password, 64k times, make it take 64k times as long to check a possible, using these GPU attacks?

http://www.schneier.com/blog/archives/2012/03/the_security_of_5.html
http://www.lightbluetouchpaper.org/2012/03/07/some-evidence-on-multi-word-passphrases/

“The results are discouraging: by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users.”

TL;DR - passphrases are better than passwords, but they’re far from perfect.

Also, securing the password exchange is becoming more and more important these days. Especially on non-SSL web servers (or on any non-encrypted connection). A nice way to do that is by using SRP: http://srp.stanford.edu/.

A little humor for the day: On the subject of “long enough” passwords, saw this over at TheDailyWTF.com:

http://img.thedailywtf.com/images/12/q2/err7/pic4.png

1 Like

Is there any way for a hacker to determine what type of hashing algorithm was used?

Let’s say a hacker got hold of a password database, can he know given a set of hashes which hashing algorithm was used to generate these?