forum.webdiplomacy.net

webDip dev coordination forum / public access todo list
It is currently Sat Nov 18, 2017 9:24 pm

All times are UTC




Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Wed Oct 01, 2008 10:21 pm 
Offline

Joined: Sat Sep 27, 2008 12:00 am
Posts: 44
Hello everybody. I read (without getting in too much detail) the ELO ranking system thread. I thought of opening this new thread for proposing yet another ranking system. I know it could sound as a redundant effort (why a third thread on ranking?!), but if you take time to read on, you will discover that this method I am about to propose is somehow an evolution of the standard ELO model. The first couple of paragraph contains lot of information that has been already presented in other threads, but I thought it was handy to compile the most important points in a concise form here, so that newcomers can contribute to the discussion without having to read 5 pages of technical posts.

Why should we change the present ranking system?
The present ranking system has two main advantages: it is simple and intuitive, it manages to keep the "seasoned players" apart from the "newcomers", hence guaranteeing a certain quality (no double accounter, less CD's) in the games with participating fees beyond 100 points.
The main problem is - though - that it does not measure the performance of a players in terms of "how good he/she is at playing diplomacy". In fact the only information that it gives is "how good a player is in placing winning bets". In facts, saying that players with more points are better players of diplomacy is pure assumption. An example can clarify: A player with ~800 hundred points could have built his capital in either way:
  • By playing 1 game to go past 100 points + 1 winner-takes-it all game in which he bet 100 points and 6 more unexperienced newcomers joined.
  • By playing 20 different PPSC games with super-professionals and having managed to survive with a decent number of SC's in all of them.
The points would be ~800 in both cases, but the skills involved are probably at the two end of the spectrum.
In essence, we need a new ranking system that measures the performance of players in terms of skills in playing diplomacy.

What should such system rely on?
Given that the skill is an immaterial ability that can't be measured, the focus must necessarily be on relative performance. This is different than absolute performance (number of wins, draws and losses) in that it contemplate how difficult it was - for the ranked player - to beat the other players. Again, an example can help: if Player A won a game against 6 newcomers at their first game, that won will not contribute as much to his ranking as if he had made a draw with the world diplomacy champion. Conversely, losing against a champion will not affect that much his rank as if he lost against 6 newcomers.
So far, the ELO system would perfectly match these criteria.
There are however certain limits to the ELO system that have been debated in length in other threads. To list some of them:
  • Newcomers would possibly get "inflated" rankings, because the dataset they are ranked upon is very small
  • it is difficult to adjust the scale in a way that adjustments will not be too abrupt (this is especially true for top-rankers).
  • there are plenty of "magic numbers"
These objections and some other practical considerations are the reason for which I believe that instead of adopting the original ELO system, we should adopt a system based on the Glicko-2 algorithm: a derivative of the ELO system that introduces two new important parameters: the rating reliability (or rating deviation) that essentially measure how accurate a rating is, and the rating volatility, that essentially measure the degree of expected fluctuation in a player's rating.
In particular, I recently stumbled upon Microsoft own version of the Glicko-2 algorithm (that is under public domain, so no licensing problems). It's name: TrueSkill (tm).

Why TrueSkill can offer an advantage over ELO
Microsoft version of the Glicko-2 algorithm has been designed specifically for multi-players online games (Xbox live), with the aim of finding opponents of matching skills to play with. This is - in essence - the same goal that we want to achieve. Weather using Glicko-2 or TrueSkill, both of them:
  • Take in account the amount of experience / consistency of results of the player (solving at once the problems related to newcomers and top-rankers
  • Have far less "magic numbers" and in particular they do not need staggering of the K value (the relative adjustment value that is normally different for different region of the ladder)
  • Provide valuable, immediate information for finding opponents of equal strength.

Practical implementation for the phpDiplomacy
In order to retain the advantages of the current points system (simplicity/intuitiveness and keeping apart seasoned players from newcomers) it could be possible:
  • To create a discrete scale of ranks (this might even correspond to the medal system proposed elsewhere) rather than displaying the actual output of the calculation of one's rank
  • Each game could be targeted to players ranking from X to Y and only players with ranks between X and Y would be allowed to join.
  • When creating a new game, the creator of the game (ranked A) would be able to set X>=A and A>=Y>=A-K.
An example can clarify greatly. Suppose we will have a system based on 20 ranks, I am ranked 7 and K was agreed to be 2:
  • When joining a game - this is intuitive - I will be allowed to join only if 7 is in the fork of allowed ranks for that game
  • When creating a game, I will be able only to create games with a lower entry point that I can establish anywhere between 20 and 7, and an upper limit between 7 and 5 (7-2).
The second point clearly illustrates how in this way professional players and newcomers can be kept apart. Only occasionally, and only if this wish comes from a pro-player, there can be a game open to everybody (in the example above, it would take a player ranked 3 to be able to have a game open to ranks 20 to 1). Yet, such a game would be a rare exception, as pro-player would gain a very minimal advantage even by winning it by WTA.
To be really perfectionists, when creating a new game and establishing the fork, a little box could tell the creator (before he hits "create game"!) how many registered players currently fall in that fork. This would for example prevent a pro-player from creating a game where only 5 other players could join...

Conversion from present system and fine-tuning
  • I personally believe that there is little if no sense in keeping a double system (Dpoints and Glicko-2-ranks) going, as the objectives of the present system would be fully reached by the new system that would also create new benefits.
  • We could use the database of past games to create the ranking, so that - effectively - players would not be required to start all over from scratch: they would simply receive a ranking based on their past records.
  • A "beta testing" phase for the ranking system is a must: this should be ran with real data from phpDiplomacy.net, but on a second site (only with ranking stats). This could be achieved, IMO, in the following way: Kestas could create a second table in his SQL or PostGre server, whit two users: an admin one that - upon cron calls - will periodically syncronise the statistics of the players on phpDiplomacy in the new table, and a read-only user that will be used by the testing team to access those data and process them according to the new algorithm.

Features of the beta testing site
The beta testing site should not contain a full installation of phpDiplomacy, as it is not meant to be a playable community. It should only have:
  • Complete list of players with ranking.
  • Search function (by player name and by ranking)
The beta testing should refine the model keeping an eye on two factors:
  • The distribution of players in the ranks should follow a gaussian/normal distribution
  • The feedback from players: it would be important to set up a few games between players of similar ranks, and the feedback from them (and the evolution of the gameboard) should indicate an higher level of competition / better balance between players)

References

I apologise if I have been lengthy! :)


Top
 Profile  
 
PostPosted: Thu Oct 02, 2008 7:38 am 
Offline
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
Thanks for the in-depth post, I only knew the basics of the TrueSkill system

My problems:
  • We have 2 other suggestions for ranking systems already; even if this is the best who's to say someone won't come along with something better (as you claim to have done to Elo)
  • The main problem you bring up is that the points system doesn't actually represent skill. This is known, and it's true of all systems to a lesser or greater extent. It's fine for it to be an approximation or ballpark estimate
  • You say the new system could replace the points system, but I'm very skeptical of this. I refer you to the Elo thread for an exhaustive list of functions which the points system performs; it's hard to get a system that does them all just as well and ranks better
  • It's not such a big deal, really; there are better things to do and I really doubt that, despite all the discussion, any developer will actually take the time to implement a new points system

Basically I think the idea of a different points system is just not feasible at present. It sucks to have to reject systems which may have taken time to work on, but there are features which could be added which would be new and not replacing anything, which would be much more likely to be considered
If you're talking about replacing something, and replacing something as fundamental to the community as the points system, it's not as likely to get accepted.

So in a nutshell my response is still the standard FAQ response regarding new points systems :(


Top
 Profile  
 
PostPosted: Thu Oct 02, 2008 10:12 am 
Offline

Joined: Sat Sep 27, 2008 12:00 am
Posts: 44
kestasjk wrote:
Thanks for the in-depth post, I only knew the basics of the TrueSkill system

I suppose we all owe kudos to you for the work done so far. I actually consider contributing to ongoing discussions a pleasure. It's the very special fun of free software and open communities, at least for me! :)

In this spirit I replied to your points here below yet I want to be clear: my intent is not "forcing" you to implement what I propose, my intent in having this dialogue is to develop our mutual understanding better and make good food for thought available to other members of this community. So...

Quote:
We have 2 other suggestions for ranking systems already; even if this is the best who's to say someone won't come along with something better (as you claim to have done to Elo)

Yes, I read in the other thread that your main criteria to implement/not implement is not having a better system but to end the discussion of the ranking system. It is my understanding that you are the sole (or nearly the sole) developer of phpDiplomacy, so I can understand where you come from with that statement. So: yes, I do not claim to have come up with the discussion-killer... indeed my goal was to foster a discussion on this system too, as it is ongoing when it comes to ELO! :D

I still think - though - that the advantage of of having a ranking system that is - more or less objectively - linked to the performance of players (ELO, Glicko2, TrueSkill...), is dramatic over the current situation. The D-points in fact do not gauge one's performance as a Dip player, but one's performance as a gambler, i.e. the capacity of players to bet their points on winning horses.

The key reason for me in advocating a change of the D-points system is that a ranking system is also a matching system, and a better matching system means for players to be able to initiate games where skills are approximatively even, thus generating much more challenging dynamics, more fun, the possibility to develop one's skills faster, etc... In a sentence: a better matching system will increase greatly the amount of fun of the users.

The latter statement is why I think the beta-phase (of any new proposed system) should consider feedback of users with similar ranks playing test-games [if the system fails at creating more challenging games, 80% of its scope will be gone].

Quote:
The main problem you bring up is that the points system doesn't actually represent skill. This is known, and it's true of all systems to a lesser or greater extent. It's fine for it to be an approximation or ballpark estimate.

I knew it was known, and I obviously agree on the fact any system will be an approximation.

What it seems it has not been clarified enough in other threads, is that while there is a general consensus that measuring performance (and especially relative performance) is a good approximation, the D-points system does not have anything to do at all with that. D-points measure the absolute ability of players to place winning bets.

An analogy might help to understand the problem: imagine you are to measure "how hot the weather is". There are many ranking systems you could use: Celsius degrees, Fahrenheit degrees, Kelvin degrees... or even scales combining temperature with humidity, thus providing the "feels like..." indication. All these system would be the ELO, Glicko, Glicko-2, TrueSkill, etc... In this analogy, D-points would be like establishing "how hot it is" by the number of trees dying that day. Sure... one could hold that if it is too hot plants tend to dehydrate and die, but you must admit that is a long stretch to be able to say that the number of dying trees is a measure of "how hot it is". Plants could die because it was not raining for the past two months, because of an epidemic of a parasites, because of a polluting element in the water table...

The same way, one can hold that if a player is good, he/she will win more, and that - holding an average bet per game - a player who wins more will have more D-points... but - as for the example before - the number of D-points can be affected much more than from playing skills, from other factors, such the player's "attitude to risk", the luck in finding often players who are less skilled, the importance a player attributes to the points, etc...

So let me reiterate for our readers: D-points do not measure a player performance and because of this can't be considered in the same class as current wildly used ranking systems belongs.

[Please forgive the wild use of colour and bold... but since I think this was not clarified enough in other threads, I wanted it to stand out very clearly for other readers that might simply skim these posts]

Quote:
You say the new system could replace the points system, but I'm very skeptical of this. I refer you to the Elo thread for an exhaustive list of functions which the points system performs; it's hard to get a system that does them all just as well and ranks better

I need a bit of help in finding the exhaustive list of functions (maybe because the thread is so much articulated and filled with quotations, I could not find any other function that was not already discussed in my previous post). As for ranking better, the theory seem to clearly indicate this, but the goal of the proposed beta-phase would be exactly to collect empirical evidence of this. Incidentally, one could then compare the ranked list of players as it is (D-points) with the one as it will be (Glicko-2 or anything else). This would be the ultimate test to know if D-points have any relationship at all with performance or not.

Quote:
It's not such a big deal, really; there are better things to do and I really doubt that, despite all the discussion, any developer will actually take the time to implement a new points system.

I agree it is not a big deal. Moreover I think you pointed out the most important thing: we (users of phpDip) can't expect you to implement everything our community members come up with, for as good as those ideas could be. There are logical priorities (fixing security bugs, to name one) but also legitimate preferences (phpDip is your hobby and creature, and if you really prefer spending time on finding a better colour palette rather than implementing the foo function, this is totally ok!).

Quote:
Basically I think the idea of a different points system is just not feasible at present. It sucks to have to reject systems which may have taken time to work on, but there are features which could be added which would be new and not replacing anything, which would be much more likely to be considered
If you're talking about replacing something, and replacing something as fundamental to the community as the points system, it's not as likely to get accepted.

As mentioned in the opening of this post, my intent in replying to you was not to "force" you to implement my idea, but rather to articulate why I think the idea is worth consideration... so I am ultimately happy with your answer.

Final thought. In the ELO thread, I read that you can however make the data collected so far available for testing a system... So this is a call for any other member of the community out there:

Oh honourable Member of this community, If you have some time, will, and mathematical understanding (and eventually knowledge of php), why don't we try to take this idea a level further and experiment with actual numbers? If you are interested... post your valuable words of wisdom here below! ;)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group