forum.webdiplomacy.net

webDip dev coordination forum / public access todo list
It is currently Mon Sep 25, 2017 6:00 am

All times are UTC




Post new topic Reply to topic  [ 26 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Mon Sep 08, 2008 4:39 pm 
Offline

Joined: Tue Aug 26, 2008 8:46 pm
Posts: 249
The Elo Rating system.

I have got further with Elo than I think Kestas is aware. I shall go through the whole thing from scratch, so it can be completely clear.

1. Standard games.

This is very easy to do, really:

New Rating = Old rating + V*(Result-Expected Result)

Expected Result= (player’s starting rating)/(Sum of all players’ starting ratings)

“V” I shall get onto later.

Result (PPSC)=SCs at end/34

Result (WTA)= 0 for defeat/survival, 1/n for an n-player draw, 1 for a solo win.

Because the sum of results = 1 = sum of expected results, there is no inflation of points.


2. New players.

From the average performances in a first game, a new player’s level is 6/10ths of the average rating. It would therefore not make sense to have new players with a rating of 100, since that would simply inflate the ratings of those who play against them. Thus, a new player should have a rating of 60.

Sadly, if that was done and nothing else, the average would be reduced to 60 from 100, so everyone needs to be given 100 points, but not necessarily all at once. New players therefore should start with a rating of sixty, but at the completion of every game be give 4 points, until they have completed 10 games. This means that the average is preserved without causing a results skew.

There is always the issue of “why did I get points when I lost a game”, which would probably be best tackled by just not showing Elo-ratings until ten games have been finished. That way we also get the advantage that we do not show unrealistic/unfounded Elo-ratings.

3. Civil Disorder.

When a player enters civil disorder, they simply loose the game.

When a player takes over a nation in civil disorder, an adjustment must be made to the expected result. The obvious solution is:

Expected Result= (SC count of Civil Disorder nation/34)*(player’s starting rating)/(Sum of all players’ starting ratings)

When a player falls into civil disorder, the ratings used for that nation in the calculation need to be decided. I suggest you take

Rating= [(the starting players rating)*(turns he played for)+0*number of turns in civil disorder+(1st New player rating)*(turns he played for)+(2nd New player rating)*(turns he played for)+…]/Number of turns.


There is a definite risk that this will cause a slight inflation or deflation of ratings. If this happens, then the formulae can be adjusted to stop this from occurring. I would suggest a six month period where it is described as a “beta version” because of the uncertainties. We shall have to see what happens to have a mathematically perfect system, which is without any doubt what I want to achieve, and what we should aim for I.

4. Variance.

The value of V is the most important part of this system which is not immediately obvious. It seems to me at the moment that V=25 is the best value, because this gets a good balance between loosing and winning making an immediate, significant difference, people not having to play loads of games to get a good idea where they are, and ratings varying so much that they are rather meaningless. On the one hand the form of a player is indicated, but on the other the general standard is reported. This is another thing that a six month period of close observation would be important for.


That pretty much sums up the latest system, one to which I cannot see major changes being made, but rather adjustments to Variance and to ensure civil disorder doesn’t lead to any inflation or deflation (even if it is negligible, it is still troubling)

As for the marriage between Points and Elo, that is another matter entirely, and one I am very happy to discuss. My thoughts up ‘till now are:

There certainly should be a period where Elo is the secondary system, running visibly, but such that you have to have a look.

There should be a description of the points/Elo systems on the profile pages, making it entirely clear what is what to avoid confusion, should there be a long term marriage.

Elo should be used for placings, but cannot undermine the points system, unless it replaces it entirely. In order not to undermine the point system, both should be shown by a players’ name, preferably the points first, so it might read: FredBloggs(1287/153)

Elo should have been visible to the community for a year (including any beta period) if it is going to supersede dippoints.


Last edited by TheGhostmaker on Mon Feb 02, 2009 4:20 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: RANKS!
PostPosted: Mon Sep 08, 2008 5:43 pm 
Offline

Joined: Fri Sep 05, 2008 7:02 pm
Posts: 22
Excellent work TheGhostmaker! :)

For the most part, I'll stay out of the technical discussions regarding Elo and leave that to you and Kestas. I am in favor of keeping the point system and implementing Elo to run alongside the points. To use TheGhostmakers example, the name would read FredBloggs(1287) :arrow: the arrow symbol representing a ranking symbol. Your Elo points wouldn't be something you see immediately.

Also, as TheGhostmaker suggested, it would serve this new implementation well to indicate some sort of Beta or trial version of the system to provide an opportunity to work out any bugs or unforeseen effects. After the beta version has run its course, we could then decide how to progress...marry points with Elo?, keep them separate?, etc.

That said, in my opinion, it would be best to associate the new rankings and titles by the Elo system rather than points. But that's a technical matter for TheGhostmaker and Kestas to ultimately figure out. I'm not sure basing ranks on points is a good way to introduce the system if there's a plan to base it something else (Elo) for the future. I suppose I'm of the opinion that if we are going to do it, let's do it right the first time...eh? :D


Top
 Profile  
 
 Post subject: Re: RANKS!
PostPosted: Tue Sep 09, 2008 10:32 am 
Offline
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
Darwyn wrote:
Quote:
As I understand it the Elo system has inflation too, the way it currently works is to award titles to people based on which percentile they're in. i.e. if you're currently in the top 5% of points scorers you earn the "Diplomat" title, rather than it being based on a fixed number.


Okay...a percentile would definitely be better then. We'll just need to establish what those percentiles are...

But here's something else to consider. Again, if points based, buying into games will adjust your ranking downward. Do you think this would somewhat deter large pot games or perhaps encourage some form of abstinence? It's one thing to lose your points as part of an ante and being presented with having to win them back. But it's another to lose your points AND your rank, isn't it? I was hoping that rankings would establish themselves as being something a bit more cherished than points as a reliable indicator of skill. In other words, you can lose your points and still have your rank. After all, isn't establishing a hierarchy that is consistent sort of the whole point with rankings?

I do favor an ELO based ranking system over a points based one because it would very rightfully separate the two. Points being your bargaining power and ranks being your absolute position within the world of phpDip. I understand there are complexities involved, so if points based it must be, we should agree to work toward switching it to ELO based just as soon as it is possible. A ranking system that moves players up and down too often like the point system wouldn't be seen as very reliable...and thus would essentially defeat the purpose.

Well, I'm still far from convinced Elo is a better system, and I think that in reality people don't fluctuate that much. The hall of fame definitely doesn't see people jump up to the top and right back down on a daily basis; people creep up and down it, as far as I can tell, just as you'd want from a rating system

But we'll see; I can see why it'd be good to use ranks as the way to present the Elo system without numbers, I think that'd be a great way to merge the two systems without being too counter-intuitive. However I'm just not sure about Elo itself

Edit: Might take a while to digest your post and come up with a response TheGhostmaker, in case you see this before I have a chance to respond


Top
 Profile  
 
 Post subject: Re: RANKS!
PostPosted: Tue Sep 09, 2008 12:40 pm 
Offline

Joined: Fri Sep 05, 2008 7:02 pm
Posts: 22
Quote:
But we'll see; I can see why it'd be good to use ranks as the way to present the Elo system without numbers, I think that'd be a great way to merge the two systems without being too counter-intuitive. However I'm just not sure about Elo itself


Fair enough. Perhaps with TheGhostmaker's help, we can begin to fully understand the steps needed to implement such a system before any decision is made to base ranks off of (points v. Elo). The next steps then, would be for you and TheGhostmaker to collaborate and hash out an agreeable proposal for the Elo system to find out where we stand in terms of feasibility of implementing this.

Again, I'll leave that to you two. So the only way forward now is with you and Ghost. Keep us updated.

In the meantime, I will mess around with titles and icons. I do hope Churchill will be able to submit some icons as well.


Top
 Profile  
 
 Post subject: Re: RANKS!
PostPosted: Tue Sep 09, 2008 1:27 pm 
Offline
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
TheGhostmaker wrote:
The Elo Rating system.

I have got further with Elo than I think Kestas is aware. I shall go through the whole thing from scratch, so it can be completely clear.

Thanks for this, hopefully I can take part in a more informed discussion now

Quote:
1. Standard games.

This is very easy to do, really:

New Rating = Old rating + V*(Result-Expected Result)

Expected Result= (player’s starting rating)/(Sum of all players’ starting ratings)

“V” I shall get onto later.

Result (PPSC)=SCs at end/34

Result (WTA)= 0 for defeat/survival, 1/n for an n-player draw, 1 for a solo win.

Because the sum of results = 1 = sum of expected results, there is no inflation of points.

Is the expected result calculated at the start of the game and stored, or calculated at the end of the game? I've found calculating it at the start makes civil disorders a lot more complex than they need to be, and creates problems when the expect value changes while a game is being played, so I'll assume the expected result is calculated at the end.

Also isn't it also 1/n for an n-player draw in PPSC draws too? At the moment there's no difference between draws in the two game-types, and the consensus seems to be that it should stay that way

Regarding the PPSC result; it'd have to be calculated using the number of supply centers which are owned by people who are still playing when the game finishes. Otherwise games which end when SCs have gone uncaptured (unlikely I know), and games which end with people in civil disorder, will have a total result for all players of less than 1 (i.e. the number of supply centers which are producing points is less than the total number of supply centers in the game).


If I understand right one of the key rules of this system is that the sum of (Result-Expected Result) for all players must equal 0. i.e. Sum[Result] = 1, and Sum[Expected Result] = 1. Otherwise it's no longer zero sum, and inflation or deflation will occur


Quote:
2. New players.

From the average performances in a first game, a new player’s level is 6/10ths of the average rating. It would therefore not make sense to have new players with a rating of 100, since that would simply inflate the ratings of those who play against them. Thus, a new player should have a rating of 60.

Sadly, if that was done and nothing else, the average would be reduced to 60 from 100, so everyone needs to be given 100 points, but not necessarily all at once. New players therefore should start with a rating of sixty, but at the completion of every game be give 4 points, until they have completed 10 games. This means that the average is preserved without causing a results skew.

There is always the issue of “why did I get points when I lost a game”, which would probably be best tackled by just not showing Elo-ratings until ten games have been finished. That way we also get the advantage that we do not show unrealistic/unfounded Elo-ratings.

Okay, so as I understand it the two key ideas behind the system are: Sum[Result] = Sum[Expected Result] = 1, and Sum[Everyones-Rating]/Number-Of-Players-In-The-System = 100 (i.e. the mean rating is 100).

If people join and don't reach 10 games though it seems like the average will start to slip. Because we get so many new players trying Dip and leaving I'm sure this problem will cause the average to decrease over time, because many new members don't make it to 10 games, which may make deflation a problem.

Having points slowly added over the first 10 games does seem a little messy, and it still only delays the problem of new players being started off at an "average" rating by a few games.

Also it does introduce another "magic number"; 6/10 or 60, which may result in more debate about the system which is exactly what I'm trying to avoid. I'd like to avoid adjusters and modifiers as much as possible.


It seems like whatever rank the player starts at there is one big difference between Elo and points which I think is very worrying; players can drop to below the score they started at. With points you can't go below a total of 100; the cost is inflation (if it's considered a bad thing), and that players can generate a stream of points out of nowhere, but the big benefit is that it doesn't provide an incentive to switch accounts.

If a player loses a game and is set back to, say, 50 from their starting 60, why not just create a new account? Who is going to play an entire other game to hope to get those 10 points back, when they could just create a new account and give it another shot?

This problem would be amplified by the fact that new players presumably wouldn't start at a score where they would get a medal, which means that as they went further into the red they'd get further from their first medal. If medals, instead of points, are going to be the main way players compare themselves then when you get further from your goal and can start back where you at the start instantly without needing to grind it out and win the points back I think that'll be a temptation few will resist.

If they are honest and do resist the temptation it'd be quite a drain for new players; not winning can be depressing, but actually going into negative territory would definitely turn a new player off playing, I think, and that's not something I think should be sacrificed even for a more accurate system.
It's about having fun first, getting new players involved, etc; if the format makes new players not want to play I think a tournament for the more experienced (like the one you're organizing) is a much better approach.


I imagine this approach would make multi-accounting more prolific by giving people greater incentive to create new accounts, and if inflation is something which this system is trying to fight this would be a source of inflation too: Even if the average score per user account remains at 100 that doesn't mean the average score per active user account won't increase: Every new account which loses points and is then discarded would increase the average rating held by active players. With multi-accounters this effect is increased, but even without multi-accounters losers will quit more often than winners, and the average score will suffer inflation.

Personally I don't consider inflation a problem, but if this system is an attempt to stop it and to provide a rating based on a fixed score rather than percentile I don't think it'll succeed.

Quote:
3. Civil Disorder.

When a player enters civil disorder, they simply loose the game.

When a player takes over a nation in civil disorder, an adjustment must be made to the expected result. The obvious solution is:

Expected Result= (SC count of Civil Disorder nation/34)*(player’s starting rating)/(Sum of all players’ starting ratings)

When a player falls into civil disorder, the ratings used for that nation in the calculation need to be decided. I suggest you take

Rating= [(the starting players rating)*(turns he played for)+0*number of turns in civil disorder+(1st New player rating)*(turns he played for)+(2nd New player rating)*(turns he played for)+…]/Number of turns.


There is a definite risk that this will cause a slight inflation or deflation of ratings. If this happens, then the formulae can be adjusted to stop this from occurring. I would suggest a six month period where it is described as a “beta version” because of the uncertainties. We shall have to see what happens to have a mathematically perfect system, which is without any doubt what I want to achieve, and what we should aim for I.

I think Darwyn had it right when he said that we should get this right the first time. I wouldn't want to have any developer spend time on the system now only to change it later, when words are much easier to write than PHP

With the formulas above I'm not really sure in a lot of places when you refer to "civil disorder player", "starting player", "nth new player", or "ranking" exactly who's ranking or who you mean. Could you clarify those?

I should also note that besides CD players another complication are discovered multi-accounters. When they're discovered their points can be simply deleted, but how should a ranking be dropped back down without affecting the average in the system? I suppose by spreading the points out evenly, but how would that be done with ~9000 players and only 200 ranking points? I worry that things like these will catch the developer out too late if it's not thought out.

Quote:
4. Variance.

The value of V is the most important part of this system which is not immediately obvious. It seems to me at the moment that V=25 is the best value, because this gets a good balance between loosing and winning making an immediate, significant difference, people not having to play loads of games to get a good idea where they are, and ratings varying so much that they are rather meaningless. On the one hand the form of a player is indicated, but on the other the general standard is reported. This is another thing that a six month period of close observation would be important for.

That pretty much sums up the latest system, one to which I cannot see major changes being made, but rather adjustments to Variance and to ensure civil disorder doesn’t lead to any inflation or deflation (even if it is negligible, it is still troubling)

Hmm, I'm really hoping that this system will be inherently un-nitpickable, rather than something which requires even more work down the line. As someone who isn't especially interested in the ranking the main incentive to change for me is to end the debate once and for all, you can see how if it doesn't give a convincing promise of that then it's not something I can get behind.

I've already said that I'm not keen on "magic numbers" like V which could be a point for even more future debates to start around, but the talk about needing to revise the system down the line is definitely something which I agree with Darwyn that it has to be right the first time. Players really hate it when they've built up their rating and then the rules change; I'm not looking forward to doing it just for Elo (especially on Facebook Diplomacy where it hasn't had a dedicated evangelist like yourself :P), let alone doing it once every six months.


Moving on from V being somewhat arbitrary I'm also not sold on the way it works. It is essentially a fixed bet, by determining how much of your rank is at stake in a game. I quite like how, in the current system, players can choose to have light hearted games or play with newcomers without worrying about the points, or gamble a hefty chunk of their points in an all-out serious game, and everything in-between.

The fixed V also means increases in rank are linear, unlike the points system where there's an approximately exponential distribution. I like how the higher up you go the more points you play with, and how you can climb the exponential curve surprisingly quickly if you play well. I think it'd feel more rewarding to go from 1800 to 2000 after a big win than from 100 to 110, if you're an average player, and 180 to 190, if you're a top player playing with the best on the site. That's subjective of course, but maybe others feel the same way.


The fixed bet size draws another interesting comparison with the points system; in the points system everyone bets the same amount into a game, but in Elo the "Expected result" essentially represents the bet, and the "bet" varies from player to player. A new player playing among far better players may only need to hang on to one supply center to qualify for a win, whereas (as an extreme case) a top player may be unable to possibly break-even as they could have an "Expected result" of over 1/2.

Granted that's an extreme case which it may never reach, but it does show the imbalance which the system causes which isn't present in the points system. Giving the game objectives which vary from player to player could have large impacts on the gameplay; it's almost literally like giving a top player a risk mission card saying "You must catch 24 territories" and weaker players get a mission card saying "You must hold australia and south america", I think it could really unbalance the game.
You could argue that PPSC itself changes the objectives, but specifically I think giving players different requirements for success means that there's more things deciding different players' actions than diplomacy alone.


The way the Elo and points system will interact may also cause problems. Specifically if a player bets away all their points but still has a high rating they'll need to play with newcomers again, but be at a huge handicap with such a high expected result that the points they'll need to build back up will have to be earned from newcomers even while their rating goes down.
A highly ranked player playing with low ranked, low points players could gain points yet lose rank. I think how the two systems will work alongside each other needs more careful thought to say the least. Plus replacing the points system entirely would bring up a whole new bunch of requirements which I don't even want to think about


To finish off this post which is already way too long: Players may be more fearful of playing up and coming new players, and may be more inclined to rest on their laurels and less inclined to play. Firstly they can't decide how much to risk on a certain game, which means each game has to be played more carefully and new players can't be taken as lightly, secondly they're at a major disadvantage if they think the lower ranked players they're up against are better than their rank implies.
I think inflation will still occur, but be decreased, and I think that'll give players less of an incentive to play to maintain their rank. Again, some thing that would be a good thing, but I don't think getting to the top and waiting there is something which shouldn't be encouraged.


So there's a list of the problems which I find with the system; some are weak and subjective, others I think are rather critical and need to be fixed before it can be implemented. I think this is a system which works well for things like chess, in settings where multi-accounts and anonymity don't exist, and matches are forced, where players don't want options other than all-or-nothing, and where the system is intended to set up a consistently serious and non-casual game. But I don't think the system applies here, just like the points system wouldn't apply there.


At the very least I think it has to be said that there is fault to be found within the new system, which means it wouldn't be the end of the ranking system debates, and that really decreases my motives for wanting it in.
You probably noticed I practically didn't talk about WTA games at all, which is just because the Elo system works a lot better for WTA games and there's a lot less to fault it for; but my point isn't that the system is inherently inferior in every way but that it is a different compromise, and not the obvious choice/perfect system which it's made out to be. That being the case I'm not sure using two imperfect ranking systems is going to make anyone happier, rather give them more to complain about.

It may be a double standard to expect Elo to be perfect when the points system isn't, but there's a difference between not being perfect and being broken. I don't think either of our systems is broken, I think both have pros and cons, but the points system is tried, tested, working, and established, and the Elo system isn't.



If you feel exasperated at having your system nitpicked, as if by a thousand clumsy monkeys with sharpened nails, this is what the system will be subject to if it isn't incontrovertibly better, and the last thing I want is spend lots of effort trading one imperfect system for another, or adding one imperfect system onto another.



As one last, very final word to this post: phpDiplomacy is open source, so whether or not you convince me you're free to take the code, get your new system added, create a new site based on it, and if the rating system blows mine away I'll be able to eat my words and everyone will benefit from the improved code :)


Top
 Profile  
 
PostPosted: Tue Sep 09, 2008 2:57 pm 
Offline

Joined: Fri Sep 05, 2008 7:02 pm
Posts: 22
And that's why I'm staying out of the technical discussion... ;)

Lots of things to think about that I wasn't even aware of.


Top
 Profile  
 
PostPosted: Tue Sep 09, 2008 7:10 pm 
Offline

Joined: Tue Aug 26, 2008 8:46 pm
Posts: 249
Quote:
It may be a double standard to expect Elo to be perfect when the points system isn't, but there's a difference between not being perfect and being broken. I don't think either of our systems is broken, I think both have pros and cons, but the points system is tried, tested, working, and established, and the Elo system isn't.

If you feel exasperated at having your system nitpicked, as if by a thousand clumsy monkeys with sharpened nails, this is what the system will be subject to if it isn't incontrovertibly better, and the last thing I want is spend lots of effort trading one imperfect system for another, or adding one imperfect system onto another.

As one last, very final word to this post: phpDiplomacy is open source, so whether or not you convince me you're free to take the code, get your new system added, create a new site based on it, and if the rating system blows mine away I'll be able to eat my words and everyone will benefit from the improved code


I think it may be appropiate to deal with what came last, first. Firstly, there is nothing wrong with the points system until you consider that it needs to be able to give an indication of a player's skill. In that respect it is fundamentally flawed, however, to expect double standards is unequivocally right. An incremental improvement, as you have said, cannot justify the change and time taken.

Nor am I upset or frustrated by the criticism, it is healthy, but I don't think the last point is so. I couldn't possible hope to build up a new community on the basis of a rating system change, and with schoolwork, and nor do I wish to do so, to split the phpdip community.

1.
Quote:
Is the expected result calculated at the start of the game and stored, or calculated at the end of the game? I've found calculating it at the start makes civil disorders a lot more complex than they need to be, and creates problems when the expect value changes while a game is being played, so I'll assume the expected result is calculated at the end.

Also isn't it also 1/n for an n-player draw in PPSC draws too? At the moment there's no difference between draws in the two game-types, and the consensus seems to be that it should stay that way

Regarding the PPSC result; it'd have to be calculated using the number of supply centers which are owned by people who are still playing when the game finishes. Otherwise games which end when SCs have gone uncaptured (unlikely I know), and games which end with people in civil disorder, will have a total result for all players of less than 1 (i.e. the number of supply centers which are producing points is less than the total number of supply centers in the game).


If I understand right one of the key rules of this system is that the sum of (Result-Expected Result) for all players must equal 0. i.e. Sum[Result] = 1, and Sum[Expected Result] = 1. Otherwise it's no longer zero sum, and inflation or deflation will occur


Calculate whenever it is simplest. I have always presumed that that would be at the end.

Yes, PPSC draws are 1/n, I beg your pardon.

In the case of uncaptured SCs, you are right, it would have to be 1/number of SCs, similarly with Civil Disorder. This is exactly why such debate is healthful.

And yes, non-inflation is an essential principle.

2.
Quote:
Okay, so as I understand it the two key ideas behind the system are: Sum[Result] = Sum[Expected Result] = 1, and Sum[Everyones-Rating]/Number-Of-Players-In-The-System = 100 (i.e. the mean rating is 100).

If people join and don't reach 10 games though it seems like the average will start to slip. Because we get so many new players trying Dip and leaving I'm sure this problem will cause the average to decrease over time, because many new members don't make it to 10 games, which may make deflation a problem.

Having points slowly added over the first 10 games does seem a little messy, and it still only delays the problem of new players being started off at an "average" rating by a few games.

Also it does introduce another "magic number"; 6/10 or 60, which may result in more debate about the system which is exactly what I'm trying to avoid. I'd like to avoid adjusters and modifiers as much as possible.


If people join and don't play 10 games, they might as well not exist. I am not interested in the average rating of players in general, but the average rating of ratable players (those who have played sufficient games to make a good guess at things. You can do it for more or fewer games, but it is essentially right. Experience is vital in diplomacy, and even a poor performance is very useful to a new player, hence...

Finally, this "magic number" is rather more an observation of statistical trends, which does sound rather better. If you look at the database, and find the average result for new players, it comes out at an average of 60.

Quote:
It seems like whatever rank the player starts at there is one big difference between Elo and points which I think is very worrying; players can drop to below the score they started at. With points you can't go below a total of 100; the cost is inflation (if it's considered a bad thing), and that players can generate a stream of points out of nowhere, but the big benefit is that it doesn't provide an incentive to switch accounts.

If a player loses a game and is set back to, say, 50 from their starting 60, why not just create a new account? Who is going to play an entire other game to hope to get those 10 points back, when they could just create a new account and give it another shot?


This is a very real consideration, and one I am glad you mentioned. Of course the last thing I want is a new generation of "Elo-multis", as that would be disturbing and damaging to the value of Elo-rating. By hiding stuff such as starting rank, you begin to tackle the problem. Because everyone will be "unrated" for ten games, there is a down side to this: you cannot just make a new account and get an immediate improvement, it takes ten games (which needless to say is quite a few) to remove the dreadful "unrated" badge, and over that period it is unlikely that a player will be able to have stunningly good results. Given the choice between winning a game and getting over 20 points or playing ten to start a new account, I know where I would be heading.

Quote:
This problem would be amplified by the fact that new players presumably wouldn't start at a score where they would get a medal, which means that as they went further into the red they'd get further from their first medal. If medals, instead of points, are going to be the main way players compare themselves then when you get further from your goal and can start back where you at the start instantly without needing to grind it out and win the points back I think that'll be a temptation few will resist.
As I said, we would have an "unrated" stage, because it is impossible to rate a new player

Quote:
If they are honest and do resist the temptation it'd be quite a drain for new players; not winning can be depressing, but actually going into negative territory would definitely turn a new player off playing, I think, and that's not something I think should be sacrificed even for a more accurate system.
It's about having fun first, getting new players involved, etc; if the format makes new players not want to play I think a tournament for the more experienced (like the one you're organizing) is a much better approach.


I imagine this approach would make multi-accounting more prolific by giving people greater incentive to create new accounts, and if inflation is something which this system is trying to fight this would be a source of inflation too: Even if the average score per user account remains at 100 that doesn't mean the average score per active user account won't increase: Every new account which loses points and is then discarded would increase the average rating held by active players. With multi-accounters this effect is increased, but even without multi-accounters losers will quit more often than winners, and the average score will suffer inflation.

Personally I don't consider inflation a problem, but if this system is an attempt to stop it and to provide a rating based on a fixed score rather than percentile I don't think it'll succeed.


There is a distinction to be made with bad players who leave and discards where multi-accounting is involved. Inflation from the former is perfectly good, because the average rating does improve when Simple Simon leaves, the latter is clearly bad, and is an important issue to be looked at, as I have addressed earlier. Assuming that this is tackled, the zero-inflation target should be reachable.

3.
Quote:
I think Darwyn had it right when he said that we should get this right the first time. I wouldn't want to have any developer spend time on the system now only to change it later, when words are much easier to write than PHP

With the formulas above I'm not really sure in a lot of places when you refer to "civil disorder player", "starting player", "nth new player", or "ranking" exactly who's ranking or who you mean. Could you clarify those?

I should also note that besides CD players another complication are discovered multi-accounters. When they're discovered their points can be simply deleted, but how should a ranking be dropped back down without affecting the average in the system? I suppose by spreading the points out evenly, but how would that be done with ~9000 players and only 200 ranking points? I worry that things like these will catch the developer out too late if it's not thought out.


Getting the principles right is easy enough when it is looked at in the right way. Proffessor Apard Elo is resonsible for the majoritory of that. Getting the numbers right is harder. Once the developent is done, adding multipliers (no pun intended), which would be all that I would do, shouldn't be more than just changing a number, so in terms of devoloper time, I am not too concerned.

When a multi is discovered, it makes sense to add the points he gained to his victims, and a simple system of giving them each an pencentage increase to their rating for each game played is fair since those of better rank are more affected.

Right, as for the Expected result/Rating formulae, I can explain them easily. Expected result for CD takeover= standard expected result * sc count/total scs. The Rating for an SC affected nation is simply a weighted mean of the ratings of the players who played and during CD taking a player of rating 0. The Weighting is in proportion to the number of turns played.

4.
Quote:
Hmm, I'm really hoping that this system will be inherently un-nitpickable, rather than something which requires even more work down the line. As someone who isn't especially interested in the ranking the main incentive to change for me is to end the debate once and for all, you can see how if it doesn't give a convincing promise of that then it's not something I can get behind.

I've already said that I'm not keen on "magic numbers" like V which could be a point for even more future debates to start around, but the talk about needing to revise the system down the line is definitely something which I agree with Darwyn that it has to be right the first time. Players really hate it when they've built up their rating and then the rules change; I'm not looking forward to doing it just for Elo (especially on Facebook Diplomacy where it hasn't had a dedicated evangelist like yourself ), let alone doing it once every six months.


What will end the debate once and for all is for the proportion of the community who can and do judge peoples ratings by sight to agree with the ratings that come out at the end, and so be able to trust the rating system. Flashman, thewonderllama are to excellent examples of players who are underrated, and had I or arthurmklo won Revelstone instead of MarekP (very plausible), we would have been overrated.

I think a "Magic Number" seems to mean a number which is not intuitively obvious for the system. V is the most obvious. There is very little intuitive feel for V just from thinking about the system. I got some from playing with data (which, incidently, I have sadly lost), and it seemed that going above 35 or below 20 was definitely wrong. All I am saying is that a six month beta would really effectively refine everyone's intuition, not least my own, on this matter, so the "Right" value will become that much clearer. So rather than once every six months, it is a matter of once in six months' time, and rarely, if ever again (At this point it is possible to take a much more justified "near enough is good enough" approach to things). As for changing V, you can do that without changing ratings. Increasing V will just add some pace into the changes we see, whilst decreasing it will just make matters a bit less frantic, but V being a little wrong (and remember that we are talking about 1-5% of your average rating a game here, which in a game with the margins of diplomacy, is fairly small) doesn't in any way say that the ratings are wildly wrong, they just a bit more ballpark or a bit to slow moving.

Quote:
Moving on from V being somewhat arbitrary I'm also not sold on the way it works. It is essentially a fixed bet, by determining how much of your rank is at stake in a game. I quite like how, in the current system, players can choose to have light hearted games or play with newcomers without worrying about the points, or gamble a hefty chunk of their points in an all-out serious game, and everything in-between.

The fixed V also means increases in rank are linear, unlike the points system where there's an approximately exponential distribution. I like how the higher up you go the more points you play with, and how you can climb the exponential curve surprisingly quickly if you play well. I think it'd feel more rewarding to go from 1800 to 2000 after a big win than from 100 to 110, if you're an average player, and 180 to 190, if you're a top player playing with the best on the site. That's subjective of course, but maybe others feel the same way.


I don't do phsycology, perhaps a handicap in diplomacy, but I don't really see it as a problem that you cannot get dramatic leaps from one game. I see it as a problem that you can. An "unrated" game option allows for the less stern side of diplomacy, but when diplomacy is "serious" (I use the term cautiously, I mean, "when the outcome is cared about"), everyone has to play to win, and play hard, else the game becomes rather devalued surely.


Quote:
The fixed bet size draws another interesting comparison with the points system; in the points system everyone bets the same amount into a game, but in Elo the "Expected result" essentially represents the bet, and the "bet" varies from player to player. A new player playing among far better players may only need to hang on to one supply center to qualify for a win, whereas (as an extreme case) a top player may be unable to possibly break-even as they could have an "Expected result" of over 1/2.

Granted that's an extreme case which it may never reach, but it does show the imbalance which the system causes which isn't present in the points system. Giving the game objectives which vary from player to player could have large impacts on the gameplay; it's almost literally like giving a top player a risk mission card saying "You must catch 24 territories" and weaker players get a mission card saying "You must hold australia and south america", I think it could really unbalance the game.
You could argue that PPSC itself changes the objectives, but specifically I think giving players different requirements for success means that there's more things deciding different players' actions than diplomacy alone.
Quote:

The "mission card" already exists. It reads "get 5 centres or a draw". And that case simply wouldn't happen without dramatic circumstances, e.i. a player having a rating of 300+ and playing a rated game against six players of 50-. And that would be a pretty awful game to be playing in. There is another critique of the system that I have thought of myself, which I shall address later on, that relates to this.

Quote:
The way the Elo and points system will interact may also cause problems. Specifically if a player bets away all their points but still has a high rating they'll need to play with newcomers again, but be at a huge handicap with such a high expected result that the points they'll need to build back up will have to be earned from newcomers even while their rating goes down.
A highly ranked player playing with low ranked, low points players could gain points yet lose rank. I think how the two systems will work alongside each other needs more careful thought to say the least. Plus replacing the points system entirely would bring up a whole new bunch of requirements which I don't even want to think about
The failing there would be in the Points system essentially giving a bonus to players who do what in other games is known as "noob-bashing"

Quote:
Players may be more fearful of playing up and coming new players, and may be more inclined to rest on their laurels and less inclined to play. Firstly they can't decide how much to risk on a certain game, which means each game has to be played more carefully and new players can't be taken as lightly, secondly they're at a major disadvantage if they think the lower ranked players they're up against are better than their rank implies.
I think inflation will still occur, but be decreased, and I think that'll give players less of an incentive to play to maintain their rank. Again, some thing that would be a good thing, but I don't think getting to the top and waiting there is something which shouldn't be encouraged.


Well, this is a matter of what players care about more, rating or playing. I should think the latter is the more likely.

Quote:
I think this is a system which works well for things like chess, in settings where multi-accounts and anonymity don't exist, and matches are forced, where players don't want options other than all-or-nothing, and where the system is intended to set up a consistently serious and non-casual game. But I don't think the system applies here, just like the points system wouldn't apply there.


That is more of a general critique of rating in phpdiplomacy, rather than of Elo, and I don't think that it is correct either, because we all have names, just not necessarily real names, we are all still playing to win, even if it is the enjoyment we play for, and the grading of results fixes the All-or-Nothing "problem"- I use the quotation marks as it is the same as in a draw in chess.

There is one thing that I have thought of having written this post. I have not accounted for the slightly non-deterministic/luck element of a game. Whim does play a part such that there is a limit to how well a player can do, and hence this should be added to the Expected Result formula. Looking at the best players' win percents gives an idea that the magnitude of this figure must be above 60%. Probably 75% is a good estimate. To make an adjustment for this you simply change the Expected result formula as below:

Expected Result= Old formula* 3/4 + (1/7)*(1/4)

I apologise for adding in another "Magic number" but the logic, the justification, is sound, and so I feel perhaps it is for the best.

I would very much like to have a few questions answered on this topic. These are:

1. Would you allow a development period where it runs invisibly in the background, so that it isn't displayed but we(I?) can make final adjustments to make it stay right?
2. Would you allow a beta period where it can be seen, after I am confident of the accuracy of it, where we can look at what othersa think/their own assessments, so that further improvements can be made?
3. If there was popular support, you were fairly convinced by the accuracy, and you were reasonaby convinced that it would not have an adverse effect on the community, would you be happy to take up the implementation of this long-term?

3 may seem like a poor question to ask, with only one answer, but I will accept either yes, no or any other answer (although fence-sitting may be of some annoyance), I really would just like to know.


Top
 Profile  
 
PostPosted: Tue Sep 09, 2008 7:33 pm 
Offline
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
Just letting you know I've read this, thanks for the response. I've got a mid-semester tomorrow so it may be a couple of days before I can go into this more


Top
 Profile  
 
 Post subject: Re: RANKS!
PostPosted: Sat Sep 13, 2008 3:12 am 
Offline

Joined: Sun Sep 07, 2008 8:56 pm
Posts: 22
kestasjk wrote:
But we'll see; I can see why it'd be good to use ranks as the way to present the Elo system without numbers, I think that'd be a great way to merge the two systems without being too counter-intuitive. However I'm just not sure about Elo itself


I am in agreement with Kestas, and Darwyn.

Points should retain their original function: to allow buy-ins and rewards for games.

ELO should be used to rank the players (I agree that raw score should NOT be displayed) from top to bottom.

As far as displaying, I agree with Darwyn, but with a small variance.

ELO would be represented as a rank symbol (based on their relative position and percentile), then the user name, then the points. (It seems cleaner to me to put the rank before the username as they are consistent in size, but usernames and point totals are not).

I.E.

:!: Darwyn (376 D)

(It might be an idea to make the ranks the same size as the font, and reduce the D icon as well.)


Top
 Profile  
 
PostPosted: Sat Sep 13, 2008 9:28 am 
Offline
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
TheGhostmaker wrote:
I think it may be appropiate to deal with what came last, first. Firstly, there is nothing wrong with the points system until you consider that it needs to be able to give an indication of a player's skill. In that respect it is fundamentally flawed, however, to expect double standards is unequivocally right. An incremental improvement, as you have said, cannot justify the change and time taken.

Okay, I'm glad we see eye to eye on that

Quote:
Nor am I upset or frustrated by the criticism, it is healthy, but I don't think the last point is so. I couldn't possible hope to build up a new community on the basis of a rating system change, and with schoolwork, and nor do I wish to do so, to split the phpdip community.

Fair enough, but this is essentially how variable phase lengths made it in; someone forked the site and proved that people used it and had no problem with it

Quote:
1.Calculate whenever it is simplest. I have always presumed that that would be at the end.

Yes, PPSC draws are 1/n, I beg your pardon.

In the case of uncaptured SCs, you are right, it would have to be 1/number of SCs, similarly with Civil Disorder. This is exactly why such debate is healthful.

And yes, non-inflation is an essential principle.

Okay, but why is non-inflation an essential principle? Should a player who makes it to the top when the server is first set up remain at the top indefinitely?

Quote:
2. If people join and don't play 10 games, they might as well not exist. I am not interested in the average rating of players in general, but the average rating of ratable players (those who have played sufficient games to make a good guess at things. You can do it for more or fewer games, but it is essentially right. Experience is vital in diplomacy, and even a poor performance is very useful to a new player, hence...

But if only the average of the rateable players matters that average will go up over time, and that'll cause inflation

Quote:
Finally, this "magic number" is rather more an observation of statistical trends, which does sound rather better. If you look at the database, and find the average result for new players, it comes out at an average of 60.

Hmm, but that was based on results before the points system was put in place?

Quote:
This is a very real consideration, and one I am glad you mentioned. Of course the last thing I want is a new generation of "Elo-multis", as that would be disturbing and damaging to the value of Elo-rating. By hiding stuff such as starting rank, you begin to tackle the problem. Because everyone will be "unrated" for ten games, there is a down side to this: you cannot just make a new account and get an immediate improvement, it takes ten games (which needless to say is quite a few) to remove the dreadful "unrated" badge, and over that period it is unlikely that a player will be able to have stunningly good results. Given the choice between winning a game and getting over 20 points or playing ten to start a new account, I know where I would be heading.

Firstly 10 games is quite a lot to play one by one (and is another magic number :P), the average number of games which a player with over 100 points has played is 11.94, which means the average player would only just have got a rank. I think this would encourage players to play more games than they otherwise would to try and quickly get their first medal.
Also I don't think it solves the problem but just delays it or makes it less likely players would make it that far

Quote:
There is a distinction to be made with bad players who leave and discards where multi-accounting is involved. Inflation from the former is perfectly good, because the average rating does improve when Simple Simon leaves, the latter is clearly bad, and is an important issue to be looked at, as I have addressed earlier. Assuming that this is tackled, the zero-inflation target should be reachable.

But isn't the reason for having the Elo system that you can't beat thousands of Simple Simons to make it to the top?

Quote:
3. Getting the principles right is easy enough when it is looked at in the right way. Proffessor Apard Elo is resonsible for the majoritory of that. Getting the numbers right is harder. Once the developent is done, adding multipliers (no pun intended), which would be all that I would do, shouldn't be more than just changing a number, so in terms of devoloper time, I am not too concerned.

When a multi is discovered, it makes sense to add the points he gained to his victims, and a simple system of giving them each an pencentage increase to their rating for each game played is fair since those of better rank are more affected.

Tracking a multi's victims specifically would be a rather significant extra coding problem, requiring new data be stored in the database. I don't envy the guy who would be writing this :s

Quote:
Right, as for the Expected result/Rating formulae, I can explain them easily. Expected result for CD takeover= standard expected result * sc count/total scs. The Rating for an SC affected nation is simply a weighted mean of the ratings of the players who played and during CD taking a player of rating 0. The Weighting is in proportion to the number of turns played.

Is the sc count/total scs taken when the CD player gets taken over, or when the game ends? I'm assuming when the CD player gets taken over, which means further data needs to be stored in the database. Feasibility of the implementation is going to be important, I think
Also I'm not really sure about this weighting, my brain may not be working right now but I can't quite figure this out, and it's very important that the sum of the expected results and the sum of the results are both equal to one to prevent inflation. Could you perhaps spell this out to me using an example?

Quote:
4. What will end the debate once and for all is for the proportion of the community who can and do judge peoples ratings by sight to agree with the ratings that come out at the end, and so be able to trust the rating system. Flashman, thewonderllama are to excellent examples of players who are underrated, and had I or arthurmklo won Revelstone instead of MarekP (very plausible), we would have been overrated.

Well there's always an element of chance in Diplomacy games. If you win a large game in an Elo system and other players might just have likely won it why wouldn't the same problem apply there?

Quote:
The "mission card" already exists. It reads "get 5 centres or a draw".

But the point is that in Elo it could vary for each player, and that the criteria for "victory", in terms of gaining or losing rank, varies. I think starting from an equal position with equal objectives is pretty critical to what Diplomacy is, and the Elo system with the expected result changes that

Quote:
The failing there would be in the Points system essentially giving a bonus to players who do what in other games is known as "noob-bashing"

Well as I understand it Elo would have to work alongside the points system, since it's generally accepted Elo couldn't replace all of the points system's functionality. If that's not the case the scope for this argument is going to get a lot bigger than how Elo functions as a rating system, if that is the case then Elo needs to work alongside the points system whatever its perceived flaws.

Quote:
Well, this is a matter of what players care about more, rating or playing. I should think the latter is the more likely.

Hmm, I'm not too sure about that. At the very least it varies from person to person, but especially if the Elo rating system is supposed to be more truthful, and people are given ranking icons, they would be even more likely than they are now to care about their rank.

Quote:
That is more of a general critique of rating in phpdiplomacy, rather than of Elo

That's true, but my point was also that there can't be a perfect rating system for phpDiplomacy; it's going to be futile to try to build a perfect system for ratings in an imperfect environment

Quote:
There is one thing that I have thought of having written this post. I have not accounted for the slightly non-deterministic/luck element of a game. Whim does play a part such that there is a limit to how well a player can do, and hence this should be added to the Expected Result formula. Looking at the best players' win percents gives an idea that the magnitude of this figure must be above 60%. Probably 75% is a good estimate. To make an adjustment for this you simply change the Expected result formula as below:

Expected Result= Old formula* 3/4 + (1/7)*(1/4)

I apologise for adding in another "Magic number" but the logic, the justification, is sound, and so I feel perhaps it is for the best.

I do worry that as we try to account for more and more problems the system will become more and more complex, which I /really/ want to avoid. Also isn't the sum of expected results supposed to add up to 1 to counter inflation? 1 * 3/4 + 1/7 * 1/4 is less than 1, which would make result - expected result > 1, which would mean playing a game would generate ranking without taking it away.

Quote:
I would very much like to have a few questions answered on this topic. These are:

1. Would you allow a development period where it runs invisibly in the background, so that it isn't displayed but we(I?) can make final adjustments to make it stay right?

I can make the relevant data available, but running an experiment within the code itself would be troublesome to maintain across multiple sites. Also the people knowing that they are being ranked differently will affect how they play; the rating system might do an okay job when no-one knows about it only to need much further adjustment when they are told how it works.

Quote:
2. Would you allow a beta period where it can be seen, after I am confident of the accuracy of it, where we can look at what othersa think/their own assessments, so that further improvements can be made?

I really don't like beta periods, it seems rather uncertain. I wouldn't want to roll it out only to modify it or drop it at a later date, leaving people wondering what use points are and angering people who did well in the new system. If the feature gets rolled out it really needs to be complete

Quote:
3. If there was popular support, you were fairly convinced by the accuracy, and you were reasonaby convinced that it would not have an adverse effect on the community, would you be happy to take up the implementation of this long-term?

As you said; incremental improvements aren't worthwhile. If the system isn't clearly inherently better without needing to be implemented I don't think there is enough justification for all the effort. Implementing a feature as a test to see whether it would work or not is something which I haven't done before and which I think is bad practice.

Quote:
3 may seem like a poor question to ask, with only one answer, but I will accept either yes, no or any other answer (although fence-sitting may be of some annoyance), I really would just like to know.

If you're looking for a yes or no answer it's currently no, and I can't foresee any changes or counter-arguments that would change that :(

The question in mind when getting this discussion going was "will this new system end the debate over ranking systems?", not even "is it a better ranking system?". I'm not sure about the answer to the latter, but I'm positive about the answer to the former; this new system will not end the debate, people will still find real or perceived faults in it.
If the new system is better according to you, but worse according to others, or could still be improved to something better according to others, I don't see why one debatable system should be favored over another


Might I suggest that instead of trying to develop a new ranking system you instead work on a built-in tournament system? I've noticed that your tournaments have been very successful, and that the tournament approach is popular for Diplomacy. Why not develop that into a new feature, rather than try and replace/merge with an existing feature which has been very successful and doesn't seem to need replacement?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 26 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group