Is Amazon's Mechanical Turk a Failure?

Minor clarification: there aren’t 128 tasks in Mechanical Turk right now, there are 128 types of task, each of which has up to several thousand actual paying tasks in them. For example, the top task type today is “GIS - Image Tagging” which represents 2,397 five-cent jobs or around $120 of actual work.

Telos: no, Jim Gray was never found.

The irreversible nature of monetary rewards reminds me of working in retail.

I used to work in IT for a local retailer with 20+ department stores and 80+ lower-end stores. Their constant lament was that the shoppers would only buy items on sale. Items not on sale would barely move off the floor. They even tried to target certain items that would never go on sale, so as to condition the shoppers to actually pay normal price from time to time.

It didn’t work very well.

-Eric

I’ve used mTurk for a few months now (requestor and worker) and while I tend to agree that it is a “solution looking for a problem”, I think that it is a valuable tool in a small website or developer’s toolbox.

That noone has really leveraged it effectively yet does not mean that noone will. I do think there will be some success stories to come out of it in the future.

You should read Douglas Rushkoff’s new book, “Get Back in the Box”. Much of it is devoted to the concept that the most effective workers do their job mainly for the intrinsic satisfaction they get from it, and that adding external motivators decreases their productivity.

Paradoxically, perks like on-site massage and stress reduction classes end up making jobs more stressful by implying that the job itself should be stressful.

Matthew Martin hasn’t got it quite right. The discussion is more about keepng the office open at the weekends so you can, if you choose, come in an work on an interesting project or one of your choosing that may possibly benefit the company. A lot of people would come in (similar to Google’s 20% of your time is spent doing whatever project you want to).

Admittedly, you have to get past the basic pay for your job first, and the intrinsic rewards are not a replacement for doing more company work, but for doing different work to your daily job, work you want to do and are not forced to do.

Check out “Punished by Rewards” by Alfie Kohn for more on intrinsic motivation.

I really wonder how all of these parallel markets will integrate in the end. I especially wonder about who will benefit most of all of this free capital. Amazon certainly does. So do advertisers. The value of Digg, Flickr, Del.icio.us isn’t a matter of technology, but of attention and participation. Playgrounds are the new economy. Just build and amusement park and let the people in as long as they bring something in return, then cash in on their behalf. Artists, writers, engineers, all of their work, you can benefit from, freely, without signing a single check, or even noticing their name. They are your new consumers.

I think the McLuhan people call this a reversal…

fascinating post.

Not long ago I had a chance to speak with Peter Cohen, director of Amazon’s Mechanical Turk, and the guy who originally championed the idea for the service. We were discussing the value of documentation. I was bemoaning the fact that my employer might adopt an advertising driven revenue model. Peter said there were only two ways of making money on the Internet: advertising or subscription.

I pointed out that Wikipedia seemed able to make its mark with neither. I’ve worked on products where we used a Wiki to get the community to drive creation of content. We didn’t have the success of Wikipedia with our documentation project, but there is some prestige associated with contributing to a Wiki project. I wish I would have thought to mention the value of the Amazon user reviews to bolster my point.

I think the real value of the Mechnical Turk service is the way the community springs up around it. Philip Greenspun says the one thing the Internet is good for is building communities. All the profitable Interent sites that survived the bust in 2000 had built communities around themselves: Amazon and its user reviews, e-bay and its transaction voting. Mechanical Turk has already seen its own grass roots effort to build a community spring to life and Amazon had little to do with it.

I’m not sure I agree that the effort has failed already, but I do wonder if Amazon and Peter Cohen have the right stuff to move it forward.

Interestingly, the author behind mTurk should know better – his thesis was about convincing users to contribute human effort without realizing it. The ESP game, which someone else mentioned, is his work. Amazon seemed to like it for their various tasks for which they could find no usable software.

I tried it out at launch, and it seemed promising, but a few years later and the system hasn’t really improved much. I always figured it was intended as a platform for testing AI techniques against humans, without key people knowing it. In that light, hourly rates aren’t as important as HITS x price; you figure if you’re the only one working on AI, you’ll probably wind up with 90 percent of the HITS. Unfortunately, if this was supposed to be a platform to attract financially motivated AI engineers, it hasn’t improved much.

But maybe Amazon’s goal of finding talented data miners and such has already been met, and the test, as it were, adequately represents the real world problems they face on the internet. In which case, I’d be nice to see some report from the author about it, but I doubt Amazon’s about to announce their findings to competitors.

The company I work for (www.vocalabs.com) does something similar. Actually, a few similar things. And we’ve had no problems with it. But there are some major differences. The biggest is that we’re not trying to generalize into a new kind of work, we’re just taking things that existed before the Internet which happen to be a little more efficient online.

The key here is that MTurk is a solution waiting for a problem, whereas we had problems, and a computer-mediated, distributed solution happened to be the best solution. I suspect there are plenty of situations like ours, where people work online. It’s just that in most cases, it’s not an attempt to realize science fiction.

I think MTurk is a credible idea, just with a bad implementation. They failed to realize that it doesn’t scale with employers. Reputation matters, and I might trust Amazon to pay me fairly, but I don’t trust J-Random-Third-Party. Especially if they can subjectively declare your work unworthy of payment. Nor do I want to work for someone whose values I don’t agree with. Furthermore, the pay scale matters. If it’s not a decent monthly income, you need to repackage it as entertainment, volunteerism, or both.

Here are the things we can do. You can decide for yourself whether or not they are like the mechanical turk. In both cases, the computer does a lot of the work and the person fills in at one crucial step.

Thing #1: we do usability testing for telephone applications. This isn’t much different from taking a survey, except that the participant makes a phone call first. As with regular surveys and in-person usability tests, we give a token payment. (Typically a buck.) But, as others have mentioned, the higher the reward, the lower the motiviation. And the more we pay, the more we have to deal with people gaming the system, rather than just helping out.

Thing #2: we have a virtual call center of people who administer customer satisfaction surveys. This is something a computer can do, but having a machine for a complaint department is just plain a bad idea. This is contract employment, but as with the Turk, specific tasks are sent out by the computer. (A survey is triggered when someone hangs up from calling our client’s customer support line.) Our survey administrators expect to make a reasonable income for their time, but the time might be half an hour now and then.

First! http://digg.com/videos/comedy/Video_A_First_Commenter_Tells_All

Not to mention why people contribute to Open Source software.

ok, I have to ask. Were you inspired to write this because the “mechanical turk” was the article of the day(a few days ago) on wikipedia?

inquiring minds want to know

The theory of intrinsic motivation goes a long way toward explaining
why Amazon’s unpaid user reviews are so popular and effective, and yet
the paid Mechanical Turk service appears to be withering on the vine.

But if we see software development as a game, then everyone is in it for the extrinsic awards?

An example of “Mechanical Turk Service” with a system of intrinsic motivation is Google Image Labeler : http://images.google.com/imagelabeler/
It uses a high score system (like folding@home) to motivate users to label images.

Nobody’s going to ask what happened to Google Answers? That was a good example of a paid “HIT” service.

http://en.wikipedia.org/wiki/Google_Answers

According to waxy.org, the entire archive just got pulled.

http://answers.google.com/answers/

IMHO the real problem for MTurk lies elsewhere:

There’s no efficient global micropayment system (yet) to support execution of the cheapest tasks.

With 2 Billion people on this world eager to work for $1 an hour (or much less), MTurk and similar systems would be ideal to distribute work (and wealth) across the globe.

As proof I’d like to point to ‘project markets’ like rentacoder, getafreelancer and scriptlance which, supported by paypal/e-gold and similar on-line payment systems, prove to be very efficient in matching projects with people.

PS Shameless plug (remove if not allowed Jeff):
If you’re interested in this space, keep an eye out for my soon-to-be-launched blog on the topic http://www.peopleasbits.com

tangent
One of the reasons MTurk may not have been promoted as much as it could have been, may be the very negative reviews it received from people in the USA, offended by the fees offered for the HITs.

It’s always interesting to see how true globalism is seen as a threat to capitalism and not accepted as proof of the much-touted supply/demand theory. What will happen once China gets on a roll. I’ve heard people say “we’ll take care of innovation”, but haven’t we all heard that before (cf. Japan, Hyundai, etc.)?!?
/tangent

Go to http://stardustathome.ssl.berkeley.edu/ and help look for particles of cosmic dust captured in silicon aerogel by a space-probe.

The scientists expect 40-100 particles hidden in many thousands of slides (actually they are like movies, your browser receives 20 or something separate images taken at slightly different focus.

You use your mouse to change the focus, and what is on the surface of each cell is at one focus, and any trapped dust particles at another focus).

When you sign up, you go through some practice slides followed by a qualification, using images of particles trapped from a comet’s tail in a similar experiment, except that many thousands of particles were caught so were much easier to find. But the scientists aren’t sure what interstellar dust particles will look like, they just guess the comet tail particles will be slightly similar. So they can’t use an image-recognition computer programme because there is nothing to train it with. They mix in images of these training images from time to time (a bit too often in my view) to keep you on your toes. Also, it’s how they determine score: number of cells viewed is one way, but also the difference between the number of random training videos you got right and wrong is how they rank people.

Let’s not forget that the person inside the mechanical Turk was a slave. That is very much in line with pay offered through Amazon’s service with the same name.