Hacker, Hack Thyself

codinghorror · June 2, 2017, 8:21am

We've read so many sad stories about communities that were fatally compromised or destroyed due to security exploits. We took that lesson to heart when we founded the Discourse project; we endeavor to build open source software that is secure and safe for everyone by default, even if there are thousands, or millions, of them out there.

This is a companion discussion topic for the original entry at https://blog.codinghorror.com/hacker-hack-thyself/

Roland3141592 · June 2, 2017, 10:30am

Great post, but I have a comment on the idea of putting 8 GPU cards in a machine, within reach of a small business budget… Please have a look at the smallest dutch super computer, the Little Green Machine, with 4 machines with each 4 GPU cards. Looks like a little bit more complicated to build than a wealthy business person could do. Still impressive performance. Link to the Little Green Machine I have read another site with even more technical details, but do not have the URL at hand, sorry. Still remember that the interconnect with InfiniBand is one of the main cost factors.

fazalmajid · June 2, 2017, 10:46am

You should consider argon2. It has a tunable memory factor to preclude massively parallel brute-forcing as that would require too much RAM to be feasible.

imjustanoob · June 2, 2017, 10:46am

It is definitely within the reach of a small business. They can follow a guide and build there own for about $10k.

Or just buy one premaid for about $22k
https://sagitta.pw/hardware/gpu-compute-nodes/brutalis/

hultner · June 2, 2017, 12:10pm

Very nice writeup!
I love seeing how well the good old Blowfish in BSD crypt() holds up.
Will keep this post in mind next time I’m working with password hashes.

hcarlens · June 2, 2017, 12:12pm

that’s “only” 8^10 combinations, a little over one billion

Did you mean 10^8, or 100 million?

JamesB7 · June 2, 2017, 12:36pm

A hash type table? You don’t need to reinvent the wheel. The crypt hash format already specifies type, salt, and difficulty as determined by the algorithm. Just use bcrypt, and when you need to strengthen passwords, rehash on login and raise the rounds parameter by 1 (it’s 2^rounds). There are plenty of other crypt algorithms as well – $5$ (Linux SHA-256) and $S$ (Drupal) to name a few. Passwords aren’t hard unless you’re unwilling to use libraries other people have made.

Roland3141592 · June 2, 2017, 12:39pm

You are right in that these numbers show that the security should be further improved. But the hacking example clearly shows that, no matter how many password requirements you implement, most of the password length is wasted by common words and names, and the extra chars are often like “123”. If we switch to automatically generated random passwords, we could do with much shorter ones. According to An Administrator’s Guide to Internet Password Research, also published in CACM (nov 2016) and presented at Usenix (2014), it is sufficient to have a password space of 10^6, i.e. 5 chars, combined with throttling for password guessing hackers

SimBa · June 2, 2017, 12:55pm

Why wouldn’t you use SRP so that passwords are not stored in the database? https://en.m.wikipedia.org/wiki/Secure_Remote_Password_protocol

SeriousM · June 2, 2017, 2:57pm

Having a strong algorithm is nice, but why not just wrap the hashes?
It’s much harder to guess a hash’s algorithm if it was wrapped in several other algorithms and it needs custom tools to bruteforce the hashes instead just using hashcat.

MarkRansom · June 2, 2017, 3:35pm

You might want to add a few canary users to the database. These would be users that don’t actually exist, with passwords that are easy to crack. If anyone ever tries to log into one of those, you know your database has been breached.

Jason_Youngquist · June 2, 2017, 4:05pm

have you thought about doing away with passwords all together?

gtd · June 2, 2017, 4:18pm

The nightmare scenario explored here is when the database is leaked though, whereas the paper is talking about throttling the login endpoint. If an attacker has the DB, the only “throttling” that can be done is via a more time-consuming hashing algorithm.

kb7iuj · June 2, 2017, 4:29pm

Mark, I’m no database expert (I’m about 2/3 of my way through a bachelor’s degree in computer science), but I’m thinking your idea is actually a great one - with a tweak: don’t have a few canary users… have many canary users. As in pow(2, n) - 1 canary users for every real user.

If the username lookup is a binary search, this adds n steps to the lookup time (which is going to be trivial), at the cost of bloating the database somewhat. But disk space is getting pretty inexpensive these days…

(I admit, maintenance of this would be a pain… some analysis could separate fake users from real ones if it isn’t done right.)

Matej_Kramny · June 2, 2017, 4:55pm

Hey @codinghorror, what are your thoughts on this article? https://engineering.gosquared.com/evolution-of-password-security

It would be interesting if that was actually a way to make hashes more “secure”.

jdconley · June 2, 2017, 6:15pm

I built a little thing to make cracking ridiculously hard: https://github.com/jdconley/pwhaas

The website cert is out of date, but the code is there. Using it in production.

A full second of hashing with argon2 on modern hardware where the hash is recomputed to the latest and greatest in the background after a login is a pretty good defense, I imagine.

travisnorthcutt · June 2, 2017, 7:34pm

This strikes me as brilliantly simple. Have you heard of anyone doing something like this in production?

kb7iuj · June 2, 2017, 7:48pm

Nope! Databases are way out of my expertise. (At least, my knowledge of them goes back to PHP4/MySQL. I haven’t practiced much in the last ten years.)

Wladimir_Palant · June 2, 2017, 9:54pm

One aspect that you left out here: GPUs might not be your worst enemy, FPGAs can also be used to crack passwords efficiently, not to be underestimated if you are expecting a dedicated attacker. And while bcrypt has been designed to be inefficient on GPUs, it can still be cracked rather efficiently on FPGAs. That’s the issue that scrypt is meant to address.

Also, the comparison in your post is misleading: the number of hashes per second means nothing if you don’t mention the corresponding number of iterations. All algorithms allow tweaking this, to make the task more computationally intensive as the available hardware improves with the time.

codinghorror · June 2, 2017, 10:38pm

Is argon2 considered proven, battle tested, and ready for wide scale adoption? The wikipedia page on it is unclear. I don’t think crypto people like to use things that are too new for that reason…

I think you misunderstood what I meant. To change the hash we use at Discourse we have to support both the old and new formats in the Discourse code (as well as any future formats).

I guess that is sort of true but it feels to me like security through obscurity. Just increase the work factor.

Definitely a neat idea, but it’d need to be a unique canary user per site.

That’s fine, and overlap with bitcoin specific hardware is definitely bad, as in, there may be some monster hashing hardware out there. Do you have any links of where to buy it, numbers, etc? The main takeaway I have is “don’t pick any password hash that remotely resembles what bitcoin uses” as that is a rich vein of madness.

This is covered several times in the post. Search for “there are two factors that go into password hash strength” if you don’t believe me.