Designing For Evil

JosephC · May 29, 2008, 12:00am

re: Mecki’s question…I wonder if something subject-matter specific would solve the “mechanical turk pr0n site” way to get around captchas? If you’re posting in a vaguely .net related forum then “which of the following words is not a reserved word in C#” - that kind of thing. Might also help improve the signal/noise.

Rhywun · May 29, 2008, 12:00am

@Rahul Chandran:

Spam is so insanely cheap to produce that it just takes one or two idiots per million clicking on it to make money.

Rix0r1 · May 29, 2008, 12:00am

I wonder if something subject-matter specific would solve the “mechanical turk pr0n site” way to get around captchas? If you’re posting in a vaguely .net related forum then “which of the following words is not a reserved word in C#” - that kind of thing. Might also help improve the signal/noise.

That’s actually an excellent idea!

It also helps to determine which pr0n sites (if any) are used as to bypass the security. Not quite sure what good that’ll do but anyway…

DarrellM · May 29, 2008, 12:00am

This project from Microsoft Research looks promising, using images of cats, instead of words, to identify humans: http://research.microsoft.com/en-us/um/redmond/projects/asirra/

Hk111 · May 29, 2008, 12:00am

Mecki: your request is not entirely accurate. You want a problem that is simple to solve for humans, simple to verify for machines and hard to solve for machines. Otherwise, you could require mathematical proofs to post things. Those proofs are surely machine-safe, however, you cannot verify them mechanically in general.

Furthermore, given a problem with the above characteristics, you will also want a large set of possible answers to lower the probability of just guessting the captcha right. Just asking does “this calculation equal 2?” will stop like half the spam, because you can just guess. (Assuming a uniform distribution of expressions that equal two and others). The larger this set is and the better your distribution of answers is (that is, the more it looks like a uniform distribution), the more spam will be stopped.

Even more, you have to prevent that those problems are farmed, that is, placed on some other website and solved by humans again. I think you need to include the URL in the answer in order to prevent this.

Taking all of this together, Id say its a damn tough job to find such problems that are solvable by a great number of users.

On the other hand, I like the idea of having some server-side AI that tracks what posts are marked as spam by enough users and marks things that look similar as “possible spam”. Judging from the learning rate of such agents with a single input, this could work, as it distributes the work of training such an agent to a large number of users (and they don’t even need to know it)

DavidL · May 29, 2008, 12:00am

Instead of the Spaceballs bit, this might be better – from Star Trek, “The Omega Glory”

Dr. McCoy: “Spock, I’ve found that evil usually triumphs, unless good is very, very careful.”

And, like Rahul Chandran, above, I just can’t wrap my head around how anyone makes money with spam. I mean, it’s OBVIOUSLY spam. Who’s going to click on it?

Zeroth · May 29, 2008, 12:00am

I think that to solve the spam problem once and for all(or at least make it very unprofitable), is to educate users. Educate every user of every computer, how to recognize spam and scams. The government should do this, pay for it, do a literal blitzkrieg of advertising to educate consumers not to buy anything from spam.

Its the only 100%(or as near as possible to make it unprofitable).

Konrad · May 29, 2008, 12:00am

Ah, but Lone Starr wins in the end, so I’m optimistic.

Anyway, I like the idea of subject-related captchas but I doubt if they’ll work in practice because the questions have to come from some sort of limited catalog. If (when) stackoverflow get’s popular enough, specific attacks will spawn.

I’d rather opt to treat all anonymous posts with the most suspicion possible short of being rude. And to counterbalance this, make it easy to create an account and sign on (OpenID, anyone?). Perhaps this even can be combined. Creating simple subject-specific question/answer pairs should be very easy for most users and this could be used to constantly permute the catalog of available captchas.

HB167 · May 29, 2008, 12:00am

What about something like when a user does a post it sends an e-mail to their account with a one-time link in it that they need to click on to activate their post?

I know you can check e-mail via programming to find the link to click back to, but it makes certain they have a valid e-mail address at least.

Mecki · May 29, 2008, 12:00am

@JosephCooney

The problem with question is, that there is a limited amount of these. Sure, questions are a good way to secure a page. For a human being, a simple question like “Which animal can fly? A lion, a bear, a monkey or a bird?” is trival to solve. For a computer, this is impossible, unless it has complete understanding of the English language, can understand the meaning of the question and find the correct answer. If you can write such a program, you would not use it for a spamming tool, you would sell it to Microsoft for 10 billion dollars Finally you can just tell your computer what you want it to do and it will understand you. Combine this with speech recognation and you have a Star Trek like Computer: “Computer, … do this and that and finally …”. The problem is: If your database contains 200 questions, one day someone has collected them all together will answers, place all that into a database and a tool can easily detect the right question and look up the answer in a database. Such a scheme will only work if you update the questions in very short intervals. Intervals that are short enough to avoid that spammers can ever keep such a database up-to-date.

Cary_Clark · May 29, 2008, 12:00am

It seems to me that we are going about fighting spammers completely the wrong way. I don’t believe there will ever be a way to completely block one group of people from using an open forum while still allowing everyone else. No matter how ingenious the filter/protection it will always be circumvented eventually, because there is money to be made in it. So instead remove the cause. Do not let anon posts include links or email addresses. Remove their ability to make money off of spamming your sight.

David · May 29, 2008, 12:00am

I think one of the reasons that spam is taking over the world is the mistaken philosophy of what Fake Steve Jobs would call the “freetards”. Sometimes free is evil. If every commentary site in the world, if Gmail, Craigslist and similar services, required a $1 upfront payment to register an account then the spammers would go broke.

Konijn · May 29, 2008, 12:00am

If you are going to work with ratings ( brownie points ) etc :

New users can only post to a firehose section, regular users must want to go the firehouse and mark posts as spam or real, if the post is marked real it goes to the proper forum/board.
Once 1 post is marked as real, that user can post 1 message per day
Once 2 more posts have not been marked as ‘not spam’, that user can can help in the firehose section ( mark 1 message per day as non-spam/spam )
Once a ‘real’ marked post is marked as ‘spam’ because it got approved by a spammer user, both the approver and the poster get rating 0, all their posts go to the firehose again.

Then of course you could go further and say that admins can give special powers to known users to have unlimited ‘spam/not spam’ voting power.

Either way, you cant solve the spammer problem with technologie only.

keppla · May 29, 2008, 12:00am

This project from Microsoft Research looks promising, using images of cats, instead of words, to identify humans

isn’t that too easy to guess? ignored case characters + numbers, 5 characters = (26 + 10) ^ 5 = 60.466.176 possibilites.

telling 12 images into 2 categories: 2 ^ 12 = 4096 possibilities.

WesleyC · May 29, 2008, 12:00am

If I might make a suggestion–for a CAPTCHA you can’t do much better than thephppro’s text-only CAPTCHA. No website I’ve ever built with it has yet been broken.

http://thephppro.com/products/captcha/

########     ########       ###############      :   ########   :       ########
#::::::#     #::::::#      #:::::::::::::::##        #::::::#         : #::::::#
#::::::#     #::::::#      #::::::######:::::#       #::::::#           #::::::#
##:::::#     #:::::## :    #######     #:::::#       #::::::# :         #::::::#
 #:::::#     #:::::#                #  #:::::#        #:::::# :  :      #:::::#
 #:::::D     D:::::#                   #:::::#   :     #:::::#         #:::::#
 #:::::D     D:::::#                ####::::#           #:::::#       #:::::#
 #:::::D     D:::::#           #####::::::##          :  #:::::#     #:::::#
 #:::::D   : D:::::#         ##::::::::### #     :  # :   #:::::#   #:::::#
 #:::::D     D:::::#     :  #:::::#####       :            #:::::# #:::::#
 #:::::D     D:::::# #     #:::::#       :  :      :        #:::::#:::::#
 #::::::#   #::::::#  :    #:::::#                           #:::::::::#
 #:::::::###:::::::#       #:::::#       ######               #:::::::#
  ##:::::::::::::##        #::::::#######:::::#   :            #:::::#
    ##:::::::::##          #::::::::::::::::::# :               #:::#
      #########       #    ####################           #      ###

Combined with a spam-fighting CodeIgniter plugin I’ve written, it seems to be amazingly effective. This plugin uses several techniques to fight spam:

An encrypted timestamp (combined with something unique about the user–perhaps an IP or user agent?) placed in a hidden field–if the form is older than one hour or so, or the IP/user agent doesn’t match, block the submission.

A text field or textarea with an easily-readable, spam-worthy name, such as “comment” or “post”, placed off the viewable area via CSS positioning–if it’s filled in, block the submission.

A text or audio-based CAPTCHA–if there’s no image to use OCR on, it’s a little difficult to break it!

Mecki · May 29, 2008, 12:00am

@Darrell This idea is stolen by MS. I have this scheme already in action (for posting to a blog) almost a year ago and it was digged on DIGG.com (that’s how I found it in the first place). Already at that time I found the weak spot: Your database will have images of how many different cats? 100? Okay, if you know the MD5 checksum of each of the cat images, a spam bot can take all images, calculate the checksum, verify against a database and has the cat images. So you would need at least some random data in each image that changes every time the same image is displayed. Even then a patter match algorithm would work (I know a nice tool that finds duplicate images on your HD, even if the dupe has a different resolution, some text written on it not found in the original, some colors changed, and so on; it still knows it’s basically the same image - and the failure rate of this tool is below 5%). Also you lock out blind people completely; how can they recognize a cat?

@hk I pretty much agree with most of the things you wrote.

More generally spoken, the question is, what is the real solution?

Avoid spam getting posted by some complicated CAPTCHA like scheme?
Don’t care for spam getting posted, but have a computer find out what is spam by some super clever application (however this might work)
Don’t care for either and hope users will mark spam posts as spam.

(3) is no good solution IMHO. Think of a side getting 10’000 spam posts a day compared to 200 user posts a day. You expect the 200 users to do all the work to tag the 10’000 spam posts as spam?

(1) is the problem I fail to see an ultimate solution for.

(2) would be perfect, but I fail to see how the application can really recognize at least 99% of all spam posts.

Spam is also a very subjective term. What I might see as acceptable might be tagged spam by someone else. Otherwise I’d say the solution is (4):

Outlaw spam all over the world, punish spammers hard and make sure this law is enforced by all means.

Laws are not always the way to go. Laws can’t solve all problems of society. However, in some cases it has already worked. A lot of people all over the world already got arrested for spamming and had to pay high fines. However, since Internet is worldwide, as long as there is at least one country that will not act against spammers, spammers will simply spam from there.

Spoon · May 29, 2008, 12:00am

Craigslist personals are targeted because there is a large number of desperate and stupid (a very bad combination) people on it. Do you really think spam is going to be a problem on SOF?

J13 · May 29, 2008, 12:00am

Caveman throws rocks at another caveman.
That caveman responds by wearing a thick animal hide for protection.
First caveman invents sharp pointy stick to stab through hide.
Second caveman invents shield to protect against sharp pointy sticks.
First caveman invents club to bash through shield.
Second caveman invents armor with extra padding to protect against club.

Thousands of years later:
First caveman invents long range missles.
Second caveman invents interceptor missles.
First caveman invents bomber aircraft.
Second caveman invents anti-aircraft.
First caveman invents stealth aircraft.
Second caveman invents radar rewritten to detect stealth.
First caveman invents nuclear ICBMs.
Second caveman goes about inventing a “Star Wars” shield.

And so it goes. And so shall the spam wars go.

The question is whether today’s status quo is closer to throwing rocks or firing nukes. I suspect we’re still fairly young in the evolutionary process.

GrahamS · May 29, 2008, 12:00am

@Mecki: “For a human being, a simple question like “Which animal can fly? A lion, a bear, a monkey or a bird?” is trival to solve. For a computer, this is impossible”

True, but a even the dumbest computer has a 1-in-4 chance of guessing it at random.
For such questions to work they have to be more open-ended, rather than selecting from a limited choice. Which just ends up frustrating genuine users.

The other approach is to go for simpler multiple choice questions with far more possible answers. This at least reduces the hit rate from guessing. (e.g. show a 20 x 20 grid of coloured squares and ask the user to click on the red one. Reduces the hit rate from 1-in-4 to 1-in-400).

MDN · May 29, 2008, 12:00am

Possibly objectionable to users (as it would require more effort), but could small culture-tuned rebuses be used to represent CAPTCHA phrases?