Hacker, Hack Thyself

I built a little thing to make cracking ridiculously hard: https://github.com/jdconley/pwhaas

The website cert is out of date, but the code is there. Using it in production.

A full second of hashing with argon2 on modern hardware where the hash is recomputed to the latest and greatest in the background after a login is a pretty good defense, I imagine.

This strikes me as brilliantly simple. Have you heard of anyone doing something like this in production?

Nope! Databases are way out of my expertise. (At least, my knowledge of them goes back to PHP4/MySQL. I haven’t practiced much in the last ten years.)

One aspect that you left out here: GPUs might not be your worst enemy, FPGAs can also be used to crack passwords efficiently, not to be underestimated if you are expecting a dedicated attacker. And while bcrypt has been designed to be inefficient on GPUs, it can still be cracked rather efficiently on FPGAs. That’s the issue that scrypt is meant to address.

Also, the comparison in your post is misleading: the number of hashes per second means nothing if you don’t mention the corresponding number of iterations. All algorithms allow tweaking this, to make the task more computationally intensive as the available hardware improves with the time.

Is argon2 considered proven, battle tested, and ready for wide scale adoption? The wikipedia page on it is unclear. I don’t think crypto people like to use things that are too new for that reason…

I think you misunderstood what I meant. To change the hash we use at Discourse we have to support both the old and new formats in the Discourse code (as well as any future formats).

I guess that is sort of true but it feels to me like security through obscurity. Just increase the work factor.

Definitely a neat idea, but it’d need to be a unique canary user per site.

That’s fine, and overlap with bitcoin specific hardware is definitely bad, as in, there may be some monster hashing hardware out there. Do you have any links of where to buy it, numbers, etc? The main takeaway I have is “don’t pick any password hash that remotely resembles what bitcoin uses” as that is a rich vein of madness.

This is covered several times in the post. Search for “there are two factors that go into password hash strength” if you don’t believe me.

A few notes not in the post, but I wanted to mention:

Running a long term password crack on your primary GPU (the one used to drive your video) is … surprisingly painful. Even with hashcat on “multitasking friendly and slowest” mode, video performance becomes incredibly sluggish. If you really want to do a long term (as in weeks, not days) password cracking project, you DEFINITELY should to build a dedicated machine for it, in my opinion. Way way too painful on your primary machine.

Speaking of password hash cracking "in the cloud :cloud: ", Amazon’s GPUs are super anemic. One GTX 1080 Ti is worth more than three AWS G2.8xlarge instances!

Hashtype: PBKDF2-HMAC-SHA256

9473.2 kH/s   8x 1080
3737.4 kH/s   16x Tesla K80, p2.16xlarge
1730.9 kH/s   1080 Ti
1173.1 kH/s   1080
 883.3 kH/s   1070
 594.5 kH/s   RX 480
 459.6 kH/s   4x GRID K520, g2.8xlarge
 304.7 kH/s   HD 6690
 114.8 kH/s   GRID K520, g2.2xlarge

See more info about Amazon’s G2 instances and compare pricing… right now the g2.8xlarge is $2.60 per hour, or $62.40 per day. There is apparently also a new P2 instance type which has up to 16 Tesla K80 GPUs which is a little better. The 8x is $7.20 per hour, and the 16x is $14.40 per hour.

I do believe blocking the top X most common passwords is the best and most efficient strategy, but it is concievable you could run an automated, regular offline GPU crack attempt on all accounts, and then auto-reset passwords of those users whose passwords can be easily cracked. I would allocate at least one hour of GPU time per account, and obviously you’d want to use popular wordlists and masks to do so, brute force is out of the question. Another very clever idea, but it would not be trivial to set up.

Finally, when it comes to password generation, obviously in a perfect world we would all use magical perfectly random password generators. Barring that, for human generated passwords, I have some suggestions:

  1. if you use a dictionary word, insert something random inside the word to make it no longer a dictionary word

  2. avoid “number at the end” or “number at the beginning”

  3. avoid “capitalize the first character”

  4. try to fold something site-specific into your password for that site, as a kind of “site hash”. Don’t just concatenate words together though – insert one word at random within the other.

Let’s say you were generating a human password for, I dunno, reddit. Rather than

Redditmonkey1985

do

reddMon5891keyit

Break up the dictionary / site-specific words, capitalize other than beginning, and put the number in somewhere other than beginning or end.

So the database has the hash and the salt. Do you store a 3rd salt in say the web configuration file? I assume it would be much harder for an attacker to get this global salt value than a database backup file? Maybe I’m wrong, if they get access to your machine they have everything, but I’m working under the assumption they got a database backup file without gaining file read access to the web server.

All this trouble to get the password of a user in a discussion forum? And then what? Post silly messages on behalf of the user? This is worth the effort if the hacker is hoping the user is using the same password on another site the hacker is interested in.

Amazon’s GPUs may be super anemic… but they’re also more available and instantly-scalable than building hardware. You don’t have to be a nation-state to have thousands of GPUs… just enough money to afford to pay Amazon’s prices. And their pricing model makes it just as efficient (price-wise) to crack them in parallel as in series. Which suggests another metric of password difficulty, instead of time: money! How much would it cost to pay Amazon to use their GPUs to crack a password.

1 Like

I know, but the table nevertheless sort of compares PBKDF, bcrypt and scrypt cracking performance without mentioning the number of iterations - that’s just pointless. I would actually like to use your numbers to validate my own approach, but without knowing the number of iterations this isn’t possible.

This article appears to have the necessary info. There is also this one which appears to be a follow-up. That’s all I know, didn’t try it out myself.

And then try to hack this user’s email account because they are likely reusing passwords.

1 Like

That is a question about the built in hashcat benchmark function, see Benchmark compare algorithms and browse the actual values in the table at

https://hashcat.net/wiki/doku.php?id=example_hashes

In other words, 1000 PBKDF iterations. Good to know :slight_smile:
bcrypt uses 25 iterations (yes, that’s 32 of them). And scrypt appears to be using 1024 iterations.

All of these values are way below current recommendations of course. In particular, given the low number of bcrypt iterations, you will probably get better results if you run it on the CPU.

1 Like

This is why I like:

  1. having two-factor authentication myself for anything important.
  2. not having to have yet another password to deal with when I go to another service (thanks for making this with OpenID Connect)

First, I applaud you for actually testing your hypothesis. You abided by the most critical question in a scientific study: how do I know what I think is true is actually true? Hubris is the enemy of security and you took measures to validate your assumptions with outside sources. Kudos!

Second, I’m curious about your experiment with the expert. Did they only go through a single iteration? Did they have the usernames that go with the passwords? Did they analyze the content on Discourse for those users? Did they analyze any other sources from the same users?
If you study how this white hat hacker worked (https://arstechnica.com/security/2013/05/how-crackers-make-minced-meat-out-of-your-passwords/), you see that they ran through four different iterations wherein they analyzed cracked passwords for patterns. Did this security expert use your various suggested password patterns as part of their attack?

My argument is that improving the hashing algorithm is definitely important and will absolutely improve security but it Sisyphean endeavor because it relies on a wholly unreliable source of limited capability which is the human. As machine learning gets better at anticipating the types of passwords humans will use, it will improve the already unbelievable ability to crack passwords.

There is a saying that the safest password is one that has never been cracked. Of all the requirements you make on Discourse users perhaps the best is denying them the ability to use the top 10K cracked passwords. Expand that list, and I suspect it will greatly improve security. Unfortunately, with IoT, there are soon to be billions of passwords out there and some subset of those cracked. That means choosing a password that hasn’t been used will get tougher. So much so, that you get closer to randomly choosing a password and at that point, you hit the point of using a password manager.

Again, to add significantly more security IMO, you need to add additional factors. Where you are connecting, what you have, who you are, patterns of entry etc. Only additional factors will make password crackers mostly obsolete.

Great article. We did the exact same thing over at our company, “hackers, hacking thyself”. Super eye opening, adjusted some of our IT policies and practices. This is valuable for wherever you are in infosec: software development, security researcher, or infosec employee for a company. A little elbow grease gives one great insight.

1 Like

Maybe we thought about this differently. Maybe we were less secure than we realized.
Now, if we lost the source code to the site, we were hosed, they would know the secret sauce.

We basically did it this way:
Given a UserName & Password, we returned a user record ONLY using UserName.

We took Password, and generated the following String:

Salt_Pre + UserName + customHash(userID, Password) + Password + UserID*X + Salt_Post

Sizes: 10 + 3-50 + 8 + 6-unl + 6 + 10 -> 40 - 50 character String, to be MD5’d by the DB.

Now, Salt_Pre and Salt_Post were UNIQUE sitewide salts.

and customHash() was a hash worked out from a Sedgewick book, and the UserID was MODed to give us a really User specific hash value.

the thought process being that if you had the DB we could not envision you getting to the password, without knowing this.

And the login code, of course added: where userId = :UserID and Passwordhash = MD5(:Password);

so what was stored in the DB was the MD5 of that final string.

Obviously you cannot alter the Username, but it is a Key anyways. And it is case sensitive.
But our thought process was to add variables that we would KNOW in the formula, but the hackers would not think of.

If I had to do it today, I would probably have a GUID table where I lookup using UserID variants to get a set of GUIDS
to add to each successive hash (for each GUID G: H = Hash(Pwd)+Hash(H)+Hash(G); ) of course with salt to start and end the process… the goal, for me is to have ENORMOUSLY long strings that get hashed into an MD5() [or much better for the final step]

IE, don’t make the user provide the obnoxious length, let the system do it. And it’s OKAY if it is COSTLY on the CPU, in fact, that is even better!

PS: Canary accounts are great, but when users can create their own account with their own password. If they do that before they steal the DB, and then see their account. They can HACK on that one account until they find the path, but our extra stuff will seem like Salt. That is why we felt things should change, and be long per customer.

One problem here - how are you going to distinguish the “canary” users from the real users… in a way that isn’t stored in the database, which we’ve already assumed the attacker has taken.

Another interesting protection is to add asymetric encryption.

For my backups, I have generated a GnuPG key. I do my backups using .tar.gz or 7Zip on Windows, then the archive itself is encrypted with the public key. Archive is then moved to the backup server. If the server is compromised, they need the private key to decipher. That private key is kept on a machine not linked to any network. If I have to backup something, I copy a backup on a USB3 disk, go to the deciphering machine, and use the private key there to decrypt the backup, then I do the recovery manually.

I do the same for logs : the logs are compressed, encrypted and sent to a server that does not allow remotes. You must go to the server physically to check logs. A diff between the log on server and backup-log is something that we have automated and it’s checked daily. Because the log server accepts logs coming in, it is protected against denial of service is spammed with logs.

Asymetric encryption is very interesting because any data can be protected on the backup server, and you need the private key to get to the data. If you keep the private key on a machine with no remote, no network which is used to grab data you did backup, it’s quite effective.

1 Like

If I were to speculate, I’d say:

  1. Salt and encrypt the username before it hits the database, where neither the salt nor the encryption algorithm sequence is stored in the database,
  2. Use other salts and encryption sequences on canary users.

This assumes that the salting and encryption sequences themselves aren’t compromised, and that Eve isn’t watching traffic to the database looking up hashed usernames at the same time she submits an unhashed username…

Litecoin (and derivatives; Dogecoin, et al) use scrypt as their hashing algorithm, which has greatly driven down the cost of scrypt ASIC hardware. I imagine it wouldn’t be too far fetched to imagine them being repurposed for password cracking.

1 Like