Slaying Mighty Dragons: Competitive Ranking and Matching Systems

Attending yesterday's Halo 3 launch event at the Silicon Valley Microsoft campus -- and the large Halo3 tournament we helped moderate -- got me thinking about player ranking and matching systems. Without a well-designed ranking and matching system in place, you'll get horribly mismatched games, where one team demolishes the other by a margin of 3x or more. This is counterproductive for both teams:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/09/slaying-mighty-dragons-competitive-ranking-and-matching-systems.html

I do not believe that most players want to achieve a 50% ratio of win/loss. Generalising from my own I guess that most players want to win having an opponent that challenges all skills.

FIRST!

You know, the thrill wasn’t quite what I imagined it to be…

Ideally (to me at least), all games should be the equivalent of beating up on puny wharf rats. When I play games, I want it to reinforce my impression that I am an invincible god.

Very interesting read!!!
Thank you for this sweet article.
It is very hard to achieve a perfect level of difficulty, especially in a game like halo with a small number of players.
However, in games like Battlefield 2/142 you can have up to 64 players which makes every game challenging in some way but dead easy in another.
One thing I disagree is with rule #2, that not playing the 'req’d amount of games/week will damage your points. That is just dumb. Well they obviously ‘want’ you to become addicted but I’ve gone through that more than enough times so it forces me to not care about my stats or to become addicted, in ways, to the game.

Cheers.

One thing I disagree is with rule #2, that not playing the 'req’d amount of games/week will damage your points.

Do you understand why this rule exists, though? If we didn’t have it, someone could achieve a high ranking, and then protect that rank by barely playing at all. Such a player would only play when s/he identified opponents or situations that were extremely favorable matchups.

You have to force people to play a certain amount whether they want to or not. It doesn’t have to be a lot, but some.

This is further explained in the wikipedia entry on Elo:


Some of the clash of agendas between game activity, and rating concerns is also seen on many servers online which have implemented the Elo system. For example, the higher rated players, being much more selective in who they play, results often in those players lurking around, just waiting for “overvalued” opponents to try and challenge. Such players because of rating concerns, may feel discouraged of course from playing any significantly lower rated players again for rating concerns. And so, this is one possible anti-activity/ anti-social aspect of the Elo rating system which needs to be understood. The agenda of points scoring can interfere with playing with abandon, and just for fun.

Great post. This reminds me of the old trick on BNet of people yanking their physical internet connection if they thought they were going to lose, so that their stats wouldn’t be affected.

There were still some jokes about the BNet matching system though.
http://www.penny-arcade.com/comic/2002/07/26

You say that obliterating players who are much lower than you is generally boring, and yet a significant portion of MMORPG players do nothing but that. Of course, there will always be jerks in video games, but it really does reduce the fun factor.

EVE Online does have a mechanism to combat this kind of trouble. Generally the idea is that the lower you are, the more you should stay in “Secure” space. If someone attacks you in “Secure” space then the cops show up and convince them of the error of their ways.

Hey Now Jeff!
Who wants to beat up a warf rat anyway? Halo 3 is everywhere it seems. Chess ratings are interesting also say a begginger @ 1000ish master @ 2400ish. This post makes me think more on how they are calculated.
Coding Horror Fan,
Catto

An even win/loss ratio sounds like a good goal on paper, but there are a lot of players that are going to want more/less of a challenge than those odds provide. If the system can accurately predict the probability of winning or losing a match, why not allow the player to select a desired difficulty and have the system attempt to match it?

fyi - Most Xbox games don’t show your TruSkill ranking for the game. However, Settler’s of Catan on XBox live does have a screen for it.

I was amazed that it has so few ‘levels’, but with the explanation above I now see that it must be showing mu and sigma is still hidden.

why not allow the player to select a desired difficulty and have the system attempt to match it?

That’s a good point-- if you’re always playing people of the exact same skill level, it will take forever to move up (or down) through the ranks.

To compensate for this, many implementations of Elo use a higher “K-value” for new players, which means they can win (or lose) more points for each game and arrive at their proper skill rating sooner. FIDE uses the following ranges:

K = 25 for a player new to the rating list until he has completed events with a total of at least 30 games.
K = 15 as long as a player’s rating remains under 2400.
K = 10 once a player’s published rating has reached 2400, and he has also completed events with a total of at least 30 games. Thereafter it remains permanently at 10.

This way, new players can see rapid changes in their rank, which is more satisfying.

For this skill system to work, you’ll need each console to have a breathalizer attached, and an appropriate multiplier…

Anyone else heard about the rather serious bug in Excel 2007? Might be worth a column Jeff.

http://groups.google.com/group/microsoft.public.excel/browse_thread/thread/2bcad1a1a4861879/2f8806d5400dfe22?hl=en#2f8806d5400dfe22

One thing I disagree is with rule #2, that not playing the 'req’d
amount of games/week will damage your points.

Do you understand why this rule exists, though?

I played on a system like that. But not often enough, so I was constantly underrated. It’s a broken system. You can make a rating system that is an estimation of skill level, or a system of rewards. They don’t mix.

I like what you wrote about Microsoft’s TrueSkill. It sounds a lot like the Glicko system by Mark Glickman. If a player doesn’t play very often, it should raise his RD, not lower his rating. If there is a public ranking of players, then after some period of inactivity, a player should simply disappear from it – not be punished by being underrated.

(And to tell the truth, I actually took a certain perverse pleasure in being underrated.)

Anyway, thank you for posting about Elo! Maybe now more games will consider it. I’ve always wished that Quake had it (count every kill as a win), instead of dealing with skills imbalance by making a shallower game (simpler maps, weapons balance, etc.)

Another strategy for matching skills is handicapping, but to do that well you need a good rating system first.

It’s a shame, though, that most games that are enlightened enough to have an unbroken Elo-style rating system don’t bother to calculate your win expectancies for you.

Hey Jeff, I’m an avid Magic player, even competed on the Pro Tour and they use a form of the Elo system there. What’s neat though in their case is the K-value. The K-value is effectively the highest number of ratings points that you can win/lose in a match in this tournament.

Various types of tournaments are given different K-values which does a couple of things. Small weekly or entry-level tourneys will be 8K, larger monthly events will be 16K and qualifiers and quarterly events will be 32K.

Highly-ranked players can attend small events with much weaker players and only risk a small amount of their ratings points. Likewise, beginners tend to attend small events to start, so they only lose small chunks. Bigger tourneys allow the beginners to have “breakout” events where a good winning-streak will give you a massive ratings boost. And highly-ranked players can actually make rating points against other highly-ranked players during the big events.

The biggest limitation tends to be at the high levels of a regional ladder. If I’m ranked say 3rd in my Province/State, there may be only a dozen local people that are really “worth playing”. So I have to start going to national or Pro Tour events to meet up with evenly matched players. There are also different formats of play, so if some format of play is unpopular in your area then you can quickly end up at the top of the regional chain but nowhere in competition nationally. At some point, the better players have to attend national events and gain ratings points so that they can “bring them back” to the regional level.

They deal with bProblem #2: Decay/b by simply removing those players who are “no longer active” (the period is a little long: 1 year), but the concept is there, if you don’t play, we just dump you from the ladder. If you start playing again, you come back in at your old number.

However, I really like the concept of having a sigma and a deviation built-in to the model. Magic suffers from having Hard Ratings numbers. So if a player gets an invite for achieving a 1900 rating as of 3 weeks from now, then players have been known to not play any ranked tourneys for 3 weeks to ensure that they don’t dip below 1900 and lose their bonus.

But this is actually a failing in the difference between chess and Magic. Chess does not involve luck and Magic does. Chess players don’t have to worry about some beginner “getting lucky”; but with Magic, some days, you’re just going to lose.

And this is where some type of sigma would be nice. Even if it were some type of “momentum buffer” so that you you didn’t get raked. I’ve gone 7-1 on the day and lost rating points. I mean clearly everyone was ranked well beneath me, but even an average player can “get lucky” and beat an excellent player (which even he admitted was the case), so at some point the system should have a check for “got unlucky” or “got lucky” versus “this player is under/over-rated”.

It’s nice to know that someone has a sigma/distributed approach, I’d love to see Magic take this on.

Video games are positive reinforcement systems. Players want to be told how great they are, all the time. You have to have progress, score, bonuses, wins, captures, or a higher position on the ranking ladder. 50% win rates are less fulfilling than the long string of wins that came before it to reach one’s current ranking.

There’s significant problems with high-rated players sniping “overvalued” opponents, but that’s minimized by randomly selected opponents. So instead, highly rated players will often get secondary accounts for “having fun”. They maintain a lower rating on those other accounts so they can have matches that they are much more likely to win (and therefore enjoy). This practice is so common that it has its own jargon, “smurfing”. A top rated player’s alternate account is his smurf.

If it’s at all feasible to challenge specific players, it allows a sinister way of hacking the system. A highly rated player can stay dominant by using their smurf to challenge an equally highly rated competitor. A loss doesn’t matter, because it’s on the low-rated smurf account, but a win significantly damages the competitor’s rating because it was a lower-rated account beating a higher rated one.

So random matchups are crucial to minimize the negatives of smurfing. But it can’t be purely random. There’s simply no point in matching a top ten player against someone who just started playing. So the matching systems limit the range of ratings within which it will find an opponent.

Again, this has a negative side effect. Unfortunately, the highest rated players become a rarefied breed. They’re far off on the right edge of the bell curve, and aren’t as likely to be online at the same time. A random matching system that limits the match range effectively allows the top players to arrange matches to their benefit.

Each time the gaming population perceives unfairness in the system, their opinion of the meaning of the ratings goes down, and they reject it.

Great Post as usual Jeff and the first one I’ve felt like commenting on - I like A.Fountain’s idea of allowing a user to select a difficulty level but with one caveat - the difficulty level matchup shouldn’t lead to someone who selected an easy difficulty playing a lower skilled player who did not select a hard difficulty. I like the idea that if a novice player is really an advanced player but doesn’t yet have stats on their side that they can challenge some higher ranked players and rise accordingly (and similarly pull down the rankings of the higher ranked players who have lost to a lower ranked player) rather than sit around grinding wharf rats up the ladder.

Why have people lose rank if they can’t play all the time? This is punishing to the casual player. Take a look at Guild Wars, they have a system working that allows casual and hardcore gamers to play together.

Akira, imagine in the chess world if you could never challenge the best GrandMaster in the world (I’m not really sure on chess terms so bear with me, cringe if you will) - they would remain the undefeated champion not because they were necessarily better than everyone else but simply because they beat everyone at one stage and then refused to play anymore. The idea of decaying rank is that in the online world players can’t always be available for challenges - the system instead allows players to play when they choose provided they play regularly to prove they are still worthy of their rank.

I don’t think decaying rating is due to any of the things described above. It’s to keep players playing. Note that Supreme Commander (RTSG) doesn’t have ratings decay, but MMOs do. They need to keep players hooked, keep them paying that monthly subscription fee.