Designing For Evil

RobertR · May 29, 2008, 12:00am

I’ve been studying PHP’s CURL (Client URL) library, some python, and ASP.NET’s HttpWebRequest to figure out how form bots work. I think you have to create a few bots yourself to thoroughly understand how they work and what countermeasures will defeat them. For example, JavaScript validation will not affect a bot. You also need to understand cookie jars, user agent spoofing, and referer spoofing.

But I still don’t understand how IP proxy sites can be used by bots and this is important because many people still think IP address blocking is effective.

ThatG · May 29, 2008, 12:00am

Every ecology develops parasites. Unless we, as a culture, dedicate ourselves to tracking down and neutralizing anyone who games a system, the parasites win and flourish.

What we can do now is to develop resistance mechanisms - something that’s already happening.

Wait until we start throwing AIs into the mix. That’s when things will really get interesting.

French_Horn · May 29, 2008, 12:00am

I used to run a dating site, and I used many techniques which I developed myself. The legality of some may be questionable. My favorite was “poisoning” an account: the user would have no clue that anything was wrong, they could post and interact as normal, but noone would see what they wrote (except for other poisoned people). Also “discourage” mode would randomly delay a user’s page loads and discard a percentage of posts - a way to get people to WANT to leave, rather than wanting to get back in with another account. I also assigned each user a hidden “risk” based on a variety of factors like ip country, which was offset by “trust” gained by being a decent member for a while. I never took the easy/pseudo way out by banning entire countries or ip blocks.

I think the best way to handle these things by far, is to not let the enemy know that you’re on to them. Don’t give them a reason to upgrade their weaponry. Let them waste their time and get shoddy results. However, this is deceptive and could be illegal. It might not fly in a big corporation.

I also developed a system to automatically poison people if they send messages that exceeded a risk threshold for words like “nigerian prince” and “millionaire”. I had 100,000 members and I kept things pretty clean. Never had need of a captcha, maybe because signup involved some javascript custom image clicking.

sblam · May 29, 2008, 12:00am

I have experience in implementing anti-spam filters.

What works:

throttling individual IPs, network blocks, unique e-mails, e-mail domains, usernames, etc. with different limit for each (and per hour, per day). Trending is really powerful.
banning of IPs (look at X-Forwarded-For too, and see XFF project).
Spammers eventually run out of open proxies and cheap VPSes.
Unfortunately you have to keep large whitelist and take off bans when IP stops spamming (because of hijacked windows machines spamming from average joe’s IPs)
statistical (bayesian) filtering does work well if you use 2 or 3 word sequences. If you have a lot of incoming ham and spam, occasional spammer trying to game filter won’t skew it and it even might learn to recognize these obvious attempts.

Nidonocu · May 29, 2008, 12:00am

I have to ask why this blog seems to get so few spam comments when the CAPTCHA word is always ‘orange’.

Dave · May 29, 2008, 12:00am

Probably either a terrible idea or one that goes against some principles behind stack overflow. So think of this as an idea for some other site, preferably one that is so self assured that it doesn’t mind a) making users jump through a few hoops to sign up and b) thinks it can charge a small fee for people to contribute and still draw people.

The idea is that to contribute you have to pay a small nominal amount of money - I’m thinking $2 to $5 - as a kind of good behaviour bond.

If you want the money back or want to stop contributing you cancel the account and 14 days later you get your money back. If you’ve been flagged as having done nasty stuff then you don’t get your money back.

A few bonuses:

interest on the money can be used to help run the site.
the cost is a disincentive for people who might otherwise poke around and look for exploits
if you need both email and a credit card to sign up (and if there’s some idea of uniqueness of both) then you’ve got something approaching two factor auth.

A few drawbacks:

a hassle to sign up
locks out people who don’t / won’t / can’t use a credit card online
more to be managed for the site, including more security and especially accounting headaches.

So not a serious suggestion, just something to think about.

I guess if you wanted less hassle you could use an invite only model and only allow a small number of invites to be sent from each user and prune the tree and / or penalize people who invited spammers / hoodlums. But then you need arbitration for false accusations…

French_Horn · May 29, 2008, 12:00am

Also I think craigslist bears most of the blame here. Either their programmers truly suck, or Craig is holding the reigns too tightly.

Require user accounts, with a long slow verification process, instead of annoying verification for every post. And for christs sake, add some features. The internet now has image capability Craig. Everybody with these big sites is so afraid to change ANYTHING because their business model might explode. Have some balls.

freeman · May 29, 2008, 12:00am

A lot of commenters miss the point that SOF and other participation websites need to reduce the barrier to entry, not raise it. The spammers will always learn the ropes, so by making an overly convoluted path to normal participation just reduces real participation because either they are not used to the process or they can’t be arsed, and the site dies. Spam drops off, but only because the spammers realize the site is a waste of their time.

I can’t begin to count the number of bb’s out there I’ve never bothered with because I don’t want to sign up, I don’t want an account with them. I’ll never come back unless google indexes it and by mistake and I land there.

@dnm - I don’t think you want the poster to be the primary moderator.

spammers will delete complaints that it’s spam
posters will delete comments that their post is dog crap
posters will gain enough ‘kudos’ to allow them to spam, and then spam everywhere. (who watches the watchers)
You also have to guard against ‘ganging up’ on legitimate users. I’m sure there’s a lot of spammers that are decent coders who could infiltrate the site to get high moderation status and abuse that power. It’s also good in general because you get some right twits in this industry who think they’re jesus’s little brother in terms of worldly importance. They are more evil than spammers by far.

In a way I don’t care if you’ve posted 1000 times or just the once. The only thing that matters is if you have something important to say. Participation, while great, isn’t everything, and just means you have heaps of free time.

micro-payments:
I’m not giving my credit card info to some random company just so I can post. It’s an idiotic suggestion. Do you trust every website you visit with your credit card details? Facebook is huge and there’s no way I’d hand over that sort of info. I might think well of Jeff/Joel, but I’m not paying to make my 2 cents known.

Jeff is soliciting my response in the first place by placing the comment box there. Let’s not forget Jeff gets paid through ad revenue by site traffic. I dare say this site wouldn’t drive as much traffic if it was devoid of the little comments box.

Or maybe you think the free-ness of the internet should really be 2-tier. Those with a voice are those that can afford one.

So:

commenting should be free of login if one wishes
if people want accounts, maybe they get some minor privilege elevation (like pseudo-moderation) and the ability to post.

I like @FrenchHorn’s poisoned accounts, the alternate reality / honeypot. While I can see ways around it, it’s pretty good to allow the spammers to think they’re on top of things, while in reality they’re not.

Ascii captcha is also damn brilliant. Not foolproof, but nothing can be.

I wonder if there could be a variant of the old-school style of verification I remember on video games: where they make you type in the third word of the second sentence on page 42, etc. The rendered page content becomes part of the captcha, which makes anonymous mechanical turks a little less effective (the unpaid variety at least).

Spam prevention has to be easy to use from the user, it has to be easy on admins, and it mustn’t significantly raise the bar to participation. Remember that if a person can figure it out, so can a machine, because the machine is still programmed by the human.

Gary · May 29, 2008, 12:00am

I’d like to see a site that used clever CSS and extra textbox honeypots to make it hard for a bot to tell what fields it should be putting data into. If you get data back from a field that the human users shouldnt even see, dump it in the bin. That’d be tricky from an accessibility standpoint though.

DanaL · May 29, 2008, 12:00am

HB:
What about something like when a user does a post it sends an e-mail to their account with a one-time link in it that they neede to click on to activate their post?

I know you can check e-mail via programming to find the link to click back to, but it makes certain they have a valid e-mail address at least.

I have seen this or very similar on craigslist already…it has been defeated as well. This also gets into the territory of it being way easier just to sign up for an account, which from what I gather does not meet Jeff’s goal, he simply wants a way to not force a walled garden on people by allowing anonymous posters.

psycotica · May 29, 2008, 12:00am

I think the people that say it is obviously spam are out of touch with the common user.

Do you know those banner ads that pretend to be Window’s Update notifications or something. I’ve watched as people clicked those. I tried to stop them, but I was too slow.

While it may not be a problem for a site specifically designed for computer programmers, not everyone on the internet is as savvy. Most people don’t expect to be fooled.

DominikG · May 29, 2008, 12:00am

Like some of the posters here have pointed out I believe that spam is mostly an economic problem, and it will take an economic solution to fix it. CAPTCHA’s and other forum/commenting moderation systems are really band aids and incremental advances to stemming the spam tide.

I think that the easiest way to eliminate spam, (or reduce it to miniscule levels) is to remove the financial incentive behind it. However instead of attempting to target the spammers by making each posting a monetary transaction we should instead be targeting the people who buy goods from these spamming agencies.

(Rough figures) Since it takes 1 person to buy something from a spam advertisement to make a profit for the spammer we should target that one person and fine/educate them for it.

Though somehow I don’t see this solution being easy or simple by any means.

Harv · May 29, 2008, 12:00am

The other problem is that when someone does come up with an effective method for deterring spam that can’t be worked around, the spammers fight back and fight back hard.

Mike_Cohen · May 29, 2008, 12:00am

Selling (and maybe using) automated spamming software should be a felony with harsh penalties and it should be strictly enforced.

AdamK · May 29, 2008, 12:00am

Wikipedia is a living testament to the fact that goodness vastly outnumbers evil.

Not everyone agrees that Wikipedia is good, these intelligent (design) folks seem to think it’s evil, so have started their own wiki trunk:

The following is a growing list of examples of liberal bias, deceit, silly gossip, and blatant errors on Wikipedia. Wikipedia has been called the National Enquirer of the Internet:

http://www.conservapedia.com/Bias_in_Wikipedia

Couldn’t resist… Interesting post Jeff.

ChrisM · May 29, 2008, 12:00am

It seems that best way to beat an automated tool is with a human response. Why not disallow anonymous postings and require an account to be able to post new messages?

New users would have their posts moderated and they could only response to an existing thread or subject. When they post a message, only the person who originated the thread would be able to see the message. They would decide if it’s real or spam and take the appropriate action. If it’s spam, that account gets tossed. If it’s a real message, it gets marked as visible to the rest of the community. Messages not acted on within 48 hours get automatically purged.

New users would have some sort of threshold where they need to have their first 3 to 5 messages moderated before they become a standard user. It does pass some extra responsibility to the person starting the thread, but you get to load balance the message moderation across the user base. You could even open the moderation so that anyone who had previous participated in that message thread could moderate the new messages from unvalidated users.

Granted, this would be an annoyance. But it’s a short term annoyance. I would put up with some initial annoyance if I knew that it would keep out the widows of Nigerian Princes.

dnm · May 29, 2008, 12:00am

Nobody will accept my spam defeating technique.

MAKE IT LEGAL TO VIOLENTLY MURDER SPAMMERS.

joe7 · May 29, 2008, 12:00am

If you want to stop linkspam - just disallow html except from trusted users…a healthy portion of existing internet content is already advertising. I don’t know why people think the situation would be different with user-contributed content. It seems like bloggers like you – I read and enjoy your blog regularly – want to have it both ways. Implicitly, you want the benefits from user-contributed content but you don’t want to do the manual work that is necessary to police it. Think about graffiti, how do people handle graffiti? I bet people handle it more with scrub brushes than they do with laws. You ever see those smart businesses that have a wall so attrative for graffiti that their best strategy is to hire a talented graffiti artist to create a mural? You need to quit pretending that people submitting forms to you are committing some sort of crime,and start thinking about a way to turn it to your advantage.

Daniel · May 29, 2008, 12:00am

Ascii art captcha, that is awesome!

For using images of cats. Lets say there are 6 images, 2-3 of which are cats. Sure, if you have only 100 cat images, you could theoretically md5 them all, but if you actually serve 6 seperate images.

So instead of serving them seperately, ‘glue’ the images all together into one large physical image in a scriptable image manip program before serving. At the same time, you can generate the supporting code needed. If a few of the sample images are procedural, along with the background for the whole image, that trashes the MD5 trick.

BCS · May 29, 2008, 12:00am

All CAPTCHA’s can be beat. It’s just a mater of cost. So use that.

Ship the commenter a product of two primes and the JavaScript to extract the primes. pick ones large enough that it takes about 15 seconds. Users wont care but you just cost the spammer 15 computer seconds per post. That jacks up the cost of spamming. If you can find other problems that are hard to solve and easy to check start using them as well. Even better find someone with such problems that will pay for them to be solved.

(this had better be my last post or y’all think /I’m/ a spam bot