Mixing Oil and Water: Authorship in a Wiki World

The truth in the end is still that the content matters more than the author. Though of course we are more interested in the texts of famous authors and we get motivated by our heros. But if the same content was provided by someone else, the content would be of course the same.

Now Alan Kay asked, how could we find the powerful new ideas? Well, new ideas come many times from new people and new people are not those that we already know like Alan Kay. So we need to try to give more attention to also new authors: If the content is brilliant, it doesn’t matter who wrote it. We can track the author for giving credit and all, but the content should be managed.

In Stackoverflow there are some ways to manage content, but if powerful new ideas come, then those should be considered.

Jeff, this is why, even though I disagree with some of your conclusions, I still read your blog.

AWESOME. Thanks for making such a cool tool (stackoverflow).

Authorship is only an indication of quality (and interest) to me…

If the post has an orange background, I know it’s Jeff responding to something he felt was important, so I read it.

If the blog replies more than about 50, then I don’t even bother reading them all (always have work to do).

I like the ability to rank a post up/down, and the natural filtering out of non-contributors which would introduce noise…
So in that sense, ‘rank’ is a better indicator than ‘author’.

When you visit Wikipedia’s entry on asphalt, you get some reasonably reliable information about asphalt.

Wrong. if you visit wikipedia, you get no reliable informations. Cheap, fit for masses, if many think it is right you get urban legends instead of information.

A very sophisticated effort to determine the authorship of each bit of text on a wiki page is WikiTrust, see http://wikitrust.soe.ucsc.edu/index.php/Main_Page - it goes one step further and calculates a trust value based on the author’s reputation combined with the time a bit of text remains unchanged.

This is a very interesting project, and it has some nice features for detecting reverts, tracking paragraphs that get moved around the page, etc.

I hope to see this live on wikipedia in the not too far away future. I’ll do what I can to make it happen.

When you look at the imbalanced Wikipedia reporting regarding the plight of the Palestinians, you can draw your own conclusions…

@Julian Radowsky: yeah that is a pretty big failing of ClearType. It is a system-wide setting, rather than monitor specific. To be honest though, I would rather replace one of the monitors than turn off ClearType entirely - without it I feel like I’m having a 90s flashback.

Also your point about the font-size being set to 90%: that is very odd, Tahoma (on XP) is a TrueType font, so it should scale perfectly to any size. It sounds like you have a bitmap font for some reason. Either that or Opera is being weird.

@Jeff: this sudden interest in authorship wouldn’t have anything to do with Joel’s recent trouble would it?
http://www.joelonsoftware.com/items/2009/01/29.html

Your algorithm is still a little off. I wrote an answer then I edited it several times - noone else has. Stackoverflow rated the answer as 98% MarkJ not 100%

http://stackoverflow.com/questions/507291/should-we-select-vb-net-or-c-when-upgrading-our-legacy-apps/508823#508823

http://stackoverflow.com/revisions/508823/list

Jeff,

Is the Levenshtein Distance formulated in terms of dynamic programming? If not, you would likely get a performance benefit from choosing a text distance based on a dynamic programming algorithm similar to those used for DNA sequence alignment, e.g. Smith-Waterman.

http://en.wikipedia.org/wiki/Sequence_alignment#Dynamic_programming

Such might be good enough for a character-wise or word-wise authorship measurement.

Along the lines of revisions and diffs at StackOverflow, what are you using to process and display the diffs?

This is an excellent idea, its a shame that the accuracy is a little off, but thats something that I’m sure will be fixed given some time…

I’d imagine the expense of doing per word calculations is only there because of the sheer quantity of data to work on. If this was done early on (it might not be practical at all now) then maintaining it would be easier if the results were all stored… you would just have to update them on each edit/new post.

I see the usual Wikipedia is all rubbish comments are coming out, my experience must be really lucky because besides the occasional obvious vandalism (gone in an instant) most of the articles are mostly correct, or at least as correct as any printed encyclopedia I have ever read, they do contain common mistakes, but so do other sources …

Did anyone come to a conclusion on who edits articles, I suspect it is a core who copyedit/spellcheck/correct, a larger number who contribute to a small number of articles, and a lot of anons who do small edits (both good and bad)

I think that it’s interesting that you selected the the Alan Kay question - I remember when it was initially on the site and it was closed (or very nearly closed - I can’t remember). The only reason that it was allowed to remain on Stack Overflow was because it was a question by Alan Kay. If asked by nearly anyone else, it would have been closed as not a real question (or something). There were comments along the lines of, Hey, don’t close this - let’s not embarrass ourselves in front of Alan Kay

I’m not sure where I’m going with this - I’m not saying that it was a bad question or even that it should have been closed (for myself, I tend to favor not closing questions unless they really, really have no value - I guess I tend toward being an inclusionist).

But I think it’s an interesting observation.

Sounds like elitism, if it requires a turing award winner to be able to post something important and otherwise the post gets closed. I mean, the post would have been important even if it was posted by someone else. There really should be a category for more philosophic questions too. And this question wasn’t even abstract but concrete kind of request for proposals about how we could find the powerful new ideas.

Interesting post, but your data is all wrong in terms of Wikipedia. You failed to notice that the stats about Wikipedia authorship you quote in beginning are from 2006.

Last summer I heard a talk from PARC’s (as in Xerox PARC) Augmented Social Cognition group about who edits Wikipedia, based on analysis of the most recent database dumps.

Wikipedia reached a huge peak, more than doubling in number of active contributors, by May 2007. It’s between one and two thousand very active contributors (which is less than 1% of all registered editors) who contribute 50% of all the content. The other 50% of edits are made by all less active contributors and anonymous users combined. On average, very active community members added a significantly larger amount of content to the site, and on average anons took away more (e.g. in copyediting and other minor pruning).

As for your system of attaching names/faces to edits and learning who the most active contributors are, they did a project called Wikidashboard (wikidashboard.parc.com) that shows you just that. Overall, I don’t know if very many people that use it. For public-facing projects (instead of internal collaboration wikis), people just care about the information, by and large.

Your point about wiki and authorship being opposing goals is also more than slightly off in my experience. There are big wikis (like wikiHow) that quite successfully attach lists of authors to articles and maintain a strong sense of being a wiki. It’s not authorship and wiki that are fundamentally opposing. It’s ownership and wiki that are fundamentally opposing.

I’m not sure what’s significant about this work… Everyone knows that nothing significant has occurred in computing since XEROX PARC in the 70s.

Seriously - when will we see this great new metric on SO?

@Graham Stewart
Cleartype is no good on my dual monitor system, the monitors are not the same and the RGB/GRB sequence is not the same on the two monitors (even though they are the same make). If I tune cleartype to look good on one of the monitors, then it’s blurry on the other.

@Jeff
I have deleted the C fonts (Consolas and Calibri), and there is no difference (using Opera on XP), it seems that your style sheet forcing the fonts to 90% causes spacing problems with Tahoma, the characters bunch up and overlap in Opera (if I zoom to 110% then the spacing is corrected). May I suggest that you remove the force to 90%?

oh,it’s nice to read about your blog! i learn the more idea for choosing this articles and post my comment… thats why i would like to know you that your so good enough for presenting yuor website and share to everyone. if you have some question would you like to ask me just visit my site. a href=antioxidant water.

oh,it’s nice to read about your blog! i learn the more idea for choosing this articles and post my comment… thats why i would like to know you that your so good enough for presenting yuor website and share to everyone.

This is great. I know I’ve refrained from making good edits to wiki posts on StackOverflow in the past because I didn’t want to steal perceived ownership (emphasis on perceived). Now that’s not really an issue.

But this post probably belonged on blog.stackoverflow.com.