Is Amazon's Mechanical Turk a Failure?

Amazon's Mechanical Turk Service is a clever reference to the famous chess-playing hoax device, The Mechanical Turk. The Mechanical Turk dates back to 1770, and has quite a storied history. Read through the Wikipedia article if you have time; it's fascinating stuff.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/04/is-amazons-mechanical-turk-a-failure.html

5$ and hour?

No wonder their code S$#s.

I suspect that Amazon were hoping that Mechanical Turk would be attractive to companies trying to get work done by people living in low-wage countries - where $2.31 for a 20-min task doesn’t seem like such a bad idea.

According to the BBC (http://news.bbc.co.uk/1/hi/business/4436692.stm), the average pay of a software engineer in India is around $5/hour - offering $2.31 for 20 mins work isn’t unreasonable in those circumstances.

Um, one thing you forgot. Writing a review about a movie is inherently fun. Writing a plot description isn’t. Posting your opinion to your own blog is fun. Posting what somebody else tells you to write to a blog you don’t care about isn’t. Submitting links to Digg which others will find entertaining is fun. Submitting junk faxes that you know people will hate getting is not.

The free activities of the social web work partially because of the merit-based rewards of community standing, but they also work because they fulfill people’s inherent need to express themselves, to see their mind validated in print. The job is automatically its own reward!

As the first poster pointed out, free and open source software works off a meritocracy as well - in addition it is a marketable skill that’s in demand, so it looks good on a resume. BUT (and this will be unbelievable to those who aren’t a programmer by nature) programming is also FUN! To a certain kind of person, inventing new programs that make the computer do nifty stuff for them is rewarding all by itself. Even though I’ve never put an open source project out there (because I’m just not that good at programming) I have quite a few little programs I’ve written for myself and my household to solve little problems over the years - and every one of them counted as recreation, in the same way that needlepoint and origami are fun and relaxing hobbies (for being creative acts) and crosswords and mazes are fun (for being stimulating problem-solving exercises).

Hence, when you posted the “Buzz-Fizz” problem, the comments were immediately flooded with solutions, even though that wasn’t the point of the post and even after you asked for it to stop. You’re waving a cat in front of a dog and expecting it not to lunge for it, against thousands of years of evolution.

I did a lot of the work involved in one of the most prolific users of Mechanical Turk for a long time, and we had a very mixed bag of results.

The biggest challenge for using Mechanical Turk is how to decide whether a particular worker’s answer is correct. If a computer program could judge the answer, you wouldn’t need a human in the first place. If someone is going to decide on each response, then it would probably be easier for them to just do the work in the first place.

Amazon ran into this problem first. The first big set of tasks was to choose a photograph that best showed the storefront for a business. Many people tried to earn quick bucks by either picking any photo at random or writing a script to submit a random answer.

We eventually came up with a scoring system that rated workers on their agreement with other workers. A computer program compared previous results with answers to decide whether we should trust a particular answer. If the computer program couldn’t decide, it referred the matter to a human (me) for authoritative judging. It took a while until it could judge all the answers without human intervention, but it eventually got there.

MT is a failure only in the sense that it hasn’t revolutionized the relationship between computers and humans, which is a pretty tall order. There just aren’t that many tasks that are easier for a human than for a computer AND which can be farmed out to potentially unreliable workers.

It’s a success in that for the problem spaces for which the above qualities are both present, MT works great. Reading numbers off of documents, figuring out the name of an album from a photograph of the cover, etc. are all the sorts of tasks that MT does well in.

MT has also been used for things that it’s not so good, such as naming your “top 3” of something or other. (This was one of Amazon’s seed tasks.)

We thought that most of the MT workload would come from overseas, namely China, Korea, and Indonesia, where paying 1 cent for a few seconds of work might be a good deal for both parties. However, we found that most of our workers were in the US, and as a group, they really wanted to get paid a lot for doing very little. There were some exceptions, of course, but most of my correspondents were somewhat indignant that they were not able to make a living off of determining not-so-subtle characteristics in data.

Give it a few years, then look again.

Another good one is the ESP game. If you haven’t done so, watch the Google Techtalk video on Human Computation and you’ll see what I mean.

Chris

The ESP game is an example of using intrinsic value (FUN!) to extract additional value.

Game:
http://www.espgame.org/

Presentation by the creator:
http://video.google.com/videoplay?docid=-8246463980976635143

So I should read the comments and not dupe, but I get bonus points for including links. :slight_smile:

The $5/hr for coding in India has to be purchasing power-adjusted. In the local currency you can live quite comfortably.

Most of us grew up on video games and hi-score lists. Most of us have been training our brains to release feel-good chemicals when we receive points. It feels really good, and thats often reward enough.

I think Amazon could do pretty well by changing their model. Requestors buy points from Amazon, and then assign a point value to a task. Put up a hi-score list for the workers, and offer a point to dollars exchange rate. I bet you’d see lots of workers playing solely for the points.

How much were you not paid to write this article? :wink:

About as much as I’m not being paid to contribute to it! :slight_smile:

Chris beat me to it. The Human Computation video is the best thing I’ve seen in ages. One of those moments when you realise that the person you’re listening to is scarily smart.

Cheers,
Colin

I just got a email from the HR department–they said they will be increasing our intrinsic rewards and henceforce will no longer be paying us a salary. Wow, I’ve never felt so rewarded, but somewhat betrayed. Fortunately, the new hires will not be paid in the first place and they will only feel pure intrinsic reward for the duration of their careers.

Wow, I never expected my dad to show up in your blog. I’ll have to ask him how he knows Mary Poppendieck.

A couple of things:

  1. I visit (and buy things at) Amazon’s site regularly (as Jeff said, primarily because of the reviews). Until this article, I’d never heard of their Mechanical Turk program.

  2. I keep all my Amazon reviews in a big Word document here. Over the years, it looks like I’ve contributed 213 pages (or 97,055 words) of free reviews to Amazon. If they want to give away money, I’ll gladly accept 5 cents a word :slight_smile: .

  3. While my son was growing up, I gave him an allowance of $1 per year of age (starting when he could understand what was happening). The purpose of that allowance was to allow him to learn how to handle money. I let him know that was what I was doing and that the allowance would stop when he reached 16. At that point, if he wanted money, he’d have to get a job. He seems to handle his own money OK, so I guess that was a success.

You make a very valid point about the monetary rewards. Arguably, one of the most popular uses of the mechanical turk has been for a task which there has been no monetary reward at all. In January 2007 when database pioneer Jim Gray disappeared, some folks got a Digital Globe satellite to fly over the area of the Pacific where his boat may have been. The problem is that the satellite returned black and white images with a feature resolution of slightly better than 1m. Jim’s boat would have then showed up as a little white speck about 6-7 pixels long and 2-3 pixels wide. Machine techniques to filter the data would have been very difficult because of the small feature and the number of potential false positives. However, by using the Turk and providing no rewards, people were able to search through thousands of the images pretty quickly. All a person had to do was check whether or not the image may have something interesting to look at some more in it. A perfect example of a non-monetary reward motivating people, many of whom were highly paid computer professionals, to take some time out of their schedule and do something.

I’m certainly in the camp of not tying allowance to chores. I know for certain that my son would gladly do without his allowance to avoid doing the dishes, if I gave him the option.

I participated in the Mechanical Turk program while it was still in its beginning. However, back then, the jobs offered differed from those which are offered now. They were not spammy and were quick to do. Most of the jobs offered were to identify among a series of 5 pictures the one that described most accurately a business or an address that was supplied. It was paid 5 - 10 per image, but it took 20 seconds to complete the task.

I think that this approach was pretty nice because the complexity of the task was quite big for a computer but trivial for a human being.

I stopped participating when I left my job where I had access to the internet and pretty much nothing else to do. Managed to get a book out of them with this.

I have worked at Mturk now for almost a year and have made over $1000 doing various types of HIT’s. The ones that mostly paid anything worth a darn and kept me at it were the RQ’s(research questions)asked from nownow and as well Askville, but things have changed over the last week and I am not really sure what is going on. It is hard to only do the answering at Askville.com now as I don’t know what the “coins” are going to be worth, maybe only products, not cash.
I think the idea is a good one for business’ such as castingwords and others and was good for us who could manage to get Great Answer Votes from the RQ’s and thereby stay in the top 30 earning the bonus.
The bonus made it all worth the competition which is a driving force for me too, I like competition.
Uclue pays more, but I can’t get in there yet, they already have their set people.

My questions for the father mowing the lawn is: do the other teenagers have jobs? Do they have an actual respect for the value of a dollar, or are they just spoiled and lazy?

Patrick Wagstrom: Did they find him?