Designing For Evil

codinghorror · May 28, 2008, 12:00am

Have you ever used Craigslist? It's an almost entirely free, mostly anonymous classified advertising service which evolved from an early internet phenomenon into a service so powerful it is often accused of single-handedly destroying the newspaper business. Unfortunately, these same characteristics also make Craigslist a particularly juicy target for spammers and evildoers. Who knows; maybe it's karma.

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2008/05/designing-for-evil.html

BCS · May 29, 2008, 12:00am

follow the money. the spam must have away to find whoever is paying for it (or why would they pay)
spam the spammer. reply to the spam with as much useless information as you legally can. Give them 10M useless email addresses. If you can do it morally, DDOS them. Set up voluntary bot nets (nospam@home) that make it as costly as possible to get anything useful out of spam

Dubs · May 29, 2008, 12:00am

“Say goodbye to your two best friends! And I don’t mean your pals in the Winnebago!!” --Dark Helmet

MrGreg · May 29, 2008, 12:00am

It seems to me the only realistic way to prevent spamming is to have some sort of vigilante task force that identifies the root source of the spam and posts their contact info publicly.

If there was a known spammer who lived near me, even if they didn’t affect sites I use, I would gladly do things to make their life miserable. Maybe not the most legal of solutions, but probably effective. Of course we would need support around the world, too.

Sean · May 29, 2008, 12:00am

The real solution is to start charging… 50 cents an ad isn’t burdensome to the public, but devastating for spammers.

Right now spammers see themselves in a gray area. Might be breaking the law, but odds of being prosecuted are small. Few spammers would cross the line into credit card fraud.

Why aren’t captchkas simple questions that would require AI far more complex than what we have right now? For example, What was the last name of the wife of the 41st President of the United States?

Or if that’s too easy, something more complex, but that would still be easily solved by a google search.

MarkT · May 29, 2008, 12:00am

(my favorite at the moment is “no murderers please!”

It is a funny thing to say, but unfortunately it’s also a valid concern. There are worse things than spammers. A girl I knew in high school was murdered as a result of responding to the wrong person’s ad on craigslist:

Larry_H · May 29, 2008, 12:00am

Spam identification is a computationally difficult task, so use it as your CAPTCHA.

Present the use with 4 messages, 1 known spam, 1 known ham and 2 others, and have them classify them. So long as they get the known spam and ham correct, you have a reasonable chance that they are human and that they have classified the unknowns correctly.

Black_Hate · May 29, 2008, 12:00am

How do the black hats avoid their own message boards filling up with spam? Is it just the “good is dumb” thing and we “good guys” refuse to get down to their level and mess up their forums for swapping hints about how to mess up others?

pookleblinky1 · May 29, 2008, 12:00am

Sometimes I think the only way to end spam would be to amass a private army of henchmen-hackers, capable of tracing down commenters’ locations and napalming the place unless innocence can be proved.

Remember that Russian spammer who got murdered, and the police admitted that literally millions of people have a motive? http://www.securityfocus.com/news/11256

If only we organized our hate…

Dave · May 29, 2008, 12:00am

I find it amazing how many ideas were posted on this blog alone (ones I hadn’t thought of), yet there is some flaw in each and every one. Although the flash one or java one seem kinda convincing. Unfortunately, the better computer AI gets, the harder humans will have to work to prove their existence.

MaxK · May 29, 2008, 12:00am

I think one of the advantages that stackoverflow will have is that its subject matter is very limited–having a “discuss anything” area would be a bad idea, IMO, for exactly that reason.

In the Personals sections of websites, two things are problematic: it involves email, and it involves sex-related activities. Those are two things that spammers love and have.

Services also involves email, and is pretty easy to fake as certain services are often desired and you’re going to get a lot of hits on those.

“For Sale” is harder to fake–each individual ad is unlikely to get many hits (with the exception of concert tickets or currently-popular electronics).

stackoverflow postings will be less likely to contain URLs or require email, and will definitely not be sex-related (let’s hope)! I think the very nature of the site will make your job a lot easier than craigslist. It will also make it a lot easier for human beings to detect that something is spam.

The one thing to think about is attachments–if you allow anybody to attach anything, that allows people to attach redirects to websites and also to attach malicious JS that will operate in your domain.

-Max

AnonymousC37 · May 29, 2008, 12:00am

Why aren’t captchkas [sic] simple questions that would require AI far more complex than what we have right now?

Because coming up with hard questions is even harder than answering hard questions. Click on my URL to see how a bot can answer your sample question automatically. You had to employ a human brain to invent that question, and yet a machine cracked it in 0.27 seconds! Now imagine trying to make a question so hard that a machine couldn’t answer it… and then try to imagine making a machine that could make questions so hard that a machine couldn’t answer them.

There’s plenty of research on questions machines can’t answer; Google “The balloon hit a branch and burst.” for more information. There’s much less (useful) research on coming up with these questions in the first place.

But this is all academic. The right solution, as others have said, is (1) stop frequenting sites where spam is indistinguishable from ham, and (2) use human moderators to mod down spam, optionally using (2) to train a filter at the same time. It works for Wikipedia, YouTube, most competent blogs…

Tony · May 29, 2008, 12:00am

Spam is a severe problem but I have noticed on a few occasions were otherwise secure systems have had holes in their spammer protection. I used to run a forum which revieved about 200 posts a day and the spam protection hadn’t failed. However I noticed a few months later that the software’s knowledge base feature (which barely anyone used or looked at) had been overrun with spam as the captcha was missing on submit link for the knowledge base. I ended up removing the section altogether as it wasn’t any use but if it had been a more public section such as the downloads db it would have caused a lot more trouble. (It was a gaming forum and it hosted a good few modifications)

vamsi · May 29, 2008, 12:00am

i see that the problem that needs to be solve is to differentiate human from a bot/machine. I was thinking of a predetermined places to click on a flash and then use the sequence of positions on the screen (which again changes on every re-load for better spam protection. I believe this will give better protection since it will be a timed way of doing it

codinghorror · May 29, 2008, 12:00am

I’m a little disappointed this has turned into an extended discussion of CAPTCHA. If you have time, do refer to my previous articles on CAPTCHA, which covered all the suggestions (and more) I’ve seen outlined here:

CAPTCHA Effectiveness
http://blog.codinghorror.com/captcha-effectiveness/

Has CAPTCHA Been “Broken”?
http://blog.codinghorror.com/has-captcha-been-broken/

CAPTCHA is Dead, Long Live CAPTCHA!
http://blog.codinghorror.com/captcha-is-dead-long-live-captcha/

In short, the Google CAPTCHA still works – the amount of time necessary to get a response back from the “breaking” services is indicative of human intervention – although some of the lesser ones have definitely fallen.

For stackoverflow, it’s likely we’ll use a lightweight “invisible” JavaScript captcha.

PaulW · May 29, 2008, 12:00am

The thing I’ve always thought about spam is that whilst software struggles to recognise it, humans can almost always spot it immediately. So I figure your best bet is to make it as easy as possible for humans to flag spam. I speak from no experience.

I do, however, have experience of using SpamSieve on my Mac. It does the heuristic thing of learning from what I flag as spam. Very few false negatives or false positives, although admittedly my spam traffic is peanuts compared to what a popular forum might receive.

BSD · May 29, 2008, 12:00am

Your suggestions (and more) have either already been implemented to no effect by Craigslist, or were not applicable in the first place.

What is a developer to do when there’s nothing left to fight with and/or your resources are far outmatched by the spammers?

Mecki · May 29, 2008, 12:00am

Well Jeff, with all that knowledge, when will you change the keyword to enter? It has been the same ever since my first comment to the page. I could write a SPAM tool right now. Actually I don’t even need a tool for that. A simple BASH shell script with curl to post text to the page in a for-loop will do

Those CAPTCHAs are getting more and more useless. The better OCR software will get, the more useless these will get. And one day the only way to make them unreadable for OCR software will be to make them unreadable for human beings, too. Despite that, they don’t work well for people with disabilites.

The biggest problem is: I’m a spammer and I want to spam a forum with CAPTCHAs that no OCR software can handle. No problem. I make a simple porn page and ask people on every access to the page to first solve the CAPTCHA. In fact this is not my capture, but the one of the forum I want to spam. That way people are helping me to solve the CAPTCHA and spam the forum. Pretty easy, isn’t it? Would also work on your page. The problem here is, that the CAPTCHAs don’t tell people where they come from (what page or service they try to secure). A capture should contain the URL of the page to that it belongs to!

Still, CAPTCHAs are not the way to go. Anyone thinking about alternatives to these? It must be something a human being can easily solve, but that’s almost impossible for a computer to solve. Not a trivial task.

RahulC · May 29, 2008, 12:00am

I have to ask .
I mean( I know it is going to sound insanely naive) I get that spam is incredibly lucrative but how exactly
what is the revenue model( not just craigslist but mail, trackbacks,comments whereever they spam)
I dont know anyone who responds to a spam ad
who is the customer in all this money that spammers are making
clearly theres tons and tons of money in it for it to be such a concerted effort but um how
Thanks

ballmer · May 29, 2008, 12:00am

If you’re going to include CAPTCHA verification on stackoverflow, you should think about using http://en.wikipedia.org/wiki/Recaptcha . It’s as great way to combine the verification with getting actual work done.