You have to remember that 50% of the world’s population has a lower than average IQ (obviously). It’s a bit cruel asking them to answer even the simplest of questions.
p.s.
CLICK HERE For hot orange babes with MASSIVE oranges who want to suck your orange - these orange sluts will make your orange 5 INCHES LONGER in just ONE WEEK!!!
Get cheap orange MEDS online from OnlineOrangePharmacy etc…
For the sites thate are concerned with signups - as opposed to say codinghorror where it’s just a login - why not just have a minimum time set? Somewhere between 1-2 mins?
For a person entering all of their information and reading the whatnot on the page, they might not even notice at all that 1-2 mins have passed since the page loaded, whereas some algorithm (or paid person) that is able to crack the captcha, can only do it from 1440 - 720 times a day. It’s not deal breaker but it definately shifts the supply curve for the people who are in it for the money.
Part of the protection inherent in schemes like Guillaume’s fluffy/not-fluffy is that it’s there on his site, and not very many else; same with the “who is this character you should know about?” scheme. Each individually might be easy to write a data-driven script for that gets a reasonable number of successes by chance. But to crack both of them, you need two sets of precomputed response data, at least.
Now you can get into a personal site and at most a smallish number of fan sites. Big deal.
If there are many flavours of site using these approaches, each with a different set of possible responses to subject-specialist challenges, the work required to overcome any arbitrary site increases. Yes, it would be possible to build up a site to plausible vocabulary database, but it would at least incur a cost to build that up. It’s just another instance of diversity vs monoculture.
The places that will always have the problem are the all-comers type site, like Google, and I don’t have a solution for that, especially under the plausible assumption that the attacker is using a botnet to spread his signal.
Research have note that the best captcha has a combination of varying character size, character font type, character colour, character positioning, background. Now, if you randomise these factors, i.e. each characters of the captcha word is unique, you should theoretically get a very good captcha.
to follow up on my last comment… As for email or other ‘identity’ providers, why not require two forms of other ‘identity’. That technique is used to get a new driver’s license, a new bank account, a passport, and even a job. I think it is perfectly resonable to expect one or more forms of identity to create another form.
Obviously the issue with this is that a majority of people are not going to be willing to give some forms of personal identification such as an SSN#, birth certificate, last utility bill, or last paycheck. Additionally, a number of these are culturally dependent. however, there are forms of ID that anyone can get where the identity provider has done some kind of validation of the identity.
The first that comes to mind is a cell phone number. Some cell phones do not require identity such as ATT GoPhones. But they do cost some money. It is doubtful that it would be worth it to a spammer to buy 1000 phones, use them to get other online identities, and then take the time to sell those phones again. However, if that is a concern, you can require that the phone be from a subset of cell phone providers that do require billing information from the user.
Does anyone else have an idea for another identity a web site can resonable request and VALIDATE that a user would be willing to supply? ideally, there should be at least three different identity options, and let the user decide which two they have available to supply.
This comment does NOT apply to validation of an anonymous user when posting a blog comment or forum post. Those sites should either require login, or rely solely on statistical filtering to determine spam based on content.
I’ve got four Gmail accounts - one that is my real name (joe.q.bloggs@, as a purely random example) for jobhunting and similarly official stuff, one that is historic and I’ve had for ages and that people I know casually use to contact me, one that is sufficiently extant I can use it for random purposes, and one that is used solely to sign up for forums and other such services to alleviate some of the spam. This isn’t even a remotely unusual use of Gmail: part of the very draw of these services is that you can throw out multiple accounts for whatever purposes you need.
In the face of that:
How are you going to compare-validate? If someone gives you a document reference that’s a duplicate to the one in your system, are you going to say ‘no, no second email for you’? That’s taken away half of the usability of the system in a stroke, and you’d better hope that your single address doesn’t get hit by too many spammers that end up making it unusable - since you can’t get a new address to ‘start over’ without destroying your old one. If you’re not going to compare-validate what’s the point in demanding ID that anyone could make up? (See: address+postcode fields in hotmail signup.)
How are you going to verify-validate? If someone gives you a cellphone number are you going to call them up and ask ‘hey, did you just sign up an email address?’ You have to, otherwise how do you know they didn’t just pluck a number out of midair? Multiply that amount of bureaucratic hassle by the number of gmail accounts there are. Then multiply that exponentially to be able to validate on things that are truly unique and have heavy information restriction in place such as SSN#s or National Insurance numbers. I’m reasonably sure the Data Protection Act would not exactly consider your SSN critical for signing up for free email. Compare that hassle to Hitting The Big Red Ban Button for too many hits from one IP or for too many suspicious captcha fails.
(As a side note, together with my 4 Gmail accounts I own 0 mobile phones.)
Are you going to trust bank details, billing details, stringent sets of contact details, or SSN/NI details to sites that are favorite targets for iframe / phishing / spamvirus attacks?
In response to Steve:
“Now you can get into a personal site and at most a smallish number of fan sites. Big deal.”
Big Deal indeed for those who run those sites should it be compromised. Unless there is a suitable message throttling mechanism in place, it may lead to a DoS attack and perhaps significant overage charges from one’s ISP.
You are absolutely correct in your comment that “all-comers” such as Google face the biggest issues.
The main point is that one should never be too self-assured of the security of one’s CAPTCHA method. There are plenty of methods that spammers can use to break a CAPTCHA, and any CAPTCHA that can be seen and analyzed by a person can be similarly analyzed by a computer…
A big question is: if we still wish proceed in using CAPTCHAs, how does one develop a CAPTCHA that is complicated - perhaps random - enough that a computer would have a difficult time with it without making it so complicated that it becomes an obstacle for a human?
Why not use better stalling/probationary techniques. 1 e-mail per day until enough long term users have not called your e-mail spam, then weed out the long term accounts that approve accounts later marked as spam senders.
x (probationary user) has sent you this e-mail is it spam? y/n
Sending the first 10 e-mails require captcha of various forms, failing any deletes the account. 20% success wouldn’t be good enough.
Any mass mailing activity in the first 30 days = deletion.
Have people prove their nationality by finding the bad grammar in a story, then monitor how many e-mails are sent to countries not speaking that native language. (oh noes, you may needs grammar to sends males to the internets - wouldn’t that be a bonus.)
But basically, just continue to imagine new tricks that computers haven’t been programmed to defeat yet, and cycle through them randomly. Then when one technique is defeated, remove it from the rotation and add 2 more unsolved techniques. Eventually you’ll have a collection of 100’s or 1000’s of tricks to be solved and solving any one of them will have a very low rate of return.
I see two keys below the F on mine… neither of which is the one key below the F on a Dvorak… also, it is not necessarily the case that everyone can receive cellphone texts…
“You have to remember that 50% of the world’s population has a lower than average IQ (obviously). It’s a bit cruel asking them to answer even the simplest of questions.”
Umm, you mean 50% of the world’s population has a lower than MEDIAN IQ. Guess we know which half you’re in… wokka wokka.
When Littlefoot’s mother died in the origional ‘Land Before Time,’ did you feel sad?
( ) Yes
( ) No
(Bots: No lying)
Joking aside, any solution should address human spamfarms. Like a, “What’s the name of this site?” or “What color is this site’s background? Yellow, white, or blue?” where multiple choice is not radio buttons, but a text field, and the question asks something about the context that’d be removed in a spamfarm.
The problem is that for something like Google or Hotmail, the site’s too well known and the reward for cracking is too high for most captchas, including context-based questions, to be effective.
Also, what would we call a CAPTCHA that is meant to thwart human spammers?