forum.webdiplomacy.net

webDip dev coordination forum / public access todo list
It is currently Fri Nov 24, 2017 12:06 am

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Oct 02, 2008 9:51 pm 
Offline

Joined: Tue Aug 26, 2008 8:46 pm
Posts: 249
I am writing an objective review of things. This is the first section, which needs comment, criticism and concensus before I can really continue. Hopefully it will lead to a correct decision being makable and so made.


THE AIMS OF POINTS AND RATINGS

We cannot make any progress whatsoever in terms of consensus on or development of a better system without the goals of it made clear. They separate into two sections. The more important is the “social” factors, whilst the more discussed is the “accuracy of rating” factor. Both are difficult to consider, and both must be considered to make any progress on the current system. I shall deal with them in turn.

SOCIAL FACTORS:
When I refer to social factors, I am concerning myself with the welfare of the community on phpdiplomacy. The rating system must act as a passive regulator of game-play habits, so as to encourage those that are constructive, namely high quality of play over quality, and the ability of players to choose to play with players of different standards. It should not be counter active-play, but should equally not penalise a player for only playing a few games.

If we have a system where people can play the number of games they like; play it with people of their own standard or a range of standards as they like; and are encouraged to take the less is more approach to games, with lesser CD coming from that, lesser NMR and greater diplomacy, then we have socially achieved every objective.

High quality game play encouraged. General encouragement of responsible play
Players can choose to play mixed or same ability games.
Should not make any enforcements on the quantity of games played, but rather on quality.

ACCURACY FACTORS:
This relates to the use of the system to value different players. Although less important than the quality of the games, this being perfected is an important issue if achieved. We must remember that “perfect” means “best approximation” rather than actually “perfect”.

Any good system must have the following traits: it is objective; it values every game equally; the reward from any game is directly related to the skill of the opposition in that game; it does not reward higher games played counts, so that newer players of equal ability have equal rating. It should also consider how a novice begins at less than average skill, but rapidly gains skill with experience. Ideally, it will not give a shown rating to a new player, since that rating will not be necessarily accurate, this can apply to any system.

Objective.
Equally valued games to stop corruption from an unexpected, surprise result.
Opponent’s skill considered.
No pressure to play many games to “keep up”, no reward for massive game counts.
New players’ lack of experience and so early success considered.


Top
 Profile  
 
PostPosted: Fri Oct 03, 2008 5:20 pm 
Offline

Joined: Sat Sep 27, 2008 12:00 am
Posts: 44
TheGhostmaker wrote:
I am writing an objective review of things. This is the first section, which needs comment, criticism and concensus before I can really continue. Hopefully it will lead to a correct decision being makable and so made.

Constructivists have demonstrated anything we perceive, say or do is all the time subjective ;)
Jokes a part... with pleasure, here is my feedback. I never took the time to thank you for the massive work you did on the ELO system. I found it very inspiring and - despite Glicko-2 seems to me a better alternative (at least in theory) - I am eager to help and contribute to any "objective" ranking system we will decide to invest our time and energy on. So: count me in! :)

Quote:
THE AIMS OF POINTS AND RATINGS - We cannot make any progress whatsoever in terms of consensus on or development of a better system without the goals of it made clear.

I think you are absolutely right on this. I believe those of us who are somehow involved in the technical discussion too (you, Darwyn, Kestas, myself and others...) should make a conscious effort in using the thread on the general forum to discuss about this, rather than repeating technical considerations already done here.

Quote:
When I refer to social factors, I am concerning myself with the welfare of the community on phpdiplomacy. The rating system must act as a passive regulator of game-play habits, so as to encourage those that are constructive, namely high quality of play over quality, and the ability of players to choose to play with players of different standards.

I do agree in general and , especially with the part I allowed myself to emphasised in your text. My way of understanding your words is that you are envisaging a system that provides information useful for making informed choices, but does not enforce limitations.
On this latter part I have a different take: I agree that members of the community should be helped in becoming good members of the community (responsible play, no CD and all the rest), but I am also aware that it takes time for newcomers to even only understand what the culture of the community is, forget conforming to it!
This is why I believe some enforcement must be made for limiting the impact of newcomers. It's a bit like when you have migrants arriving in a country: before to give them full citizenship, you want them to hang around for a bit: learn the language, understand the social conventions, making sure they are not criminals, etc...

This is my list, build on the one you proposed:
  • High quality game play encouraged - Agree, but we need to break it down further. For me "high quality" means: consistent diplomatic exchange, no CD, respect of individual needs (for example if somebody needs to pause), politness, etc...
  • General encouragement of responsible play - For me is another way to mean the same of the previous point, so I would remove it.
  • Players can choose to play mixed or same ability games - Absolutely, but players need to be informed about it. This is why I think we need a ranking system (instead of the D-points).
  • Should not make any enforcements on the quantity of games played, but rather on quality - disagree, see my next point.
  • Minimal enforcement essentially towards newcomers - The degree of freedom in terms of number of games to join and level of games to join should be increasingly higher according to the player career. For example: a newcomer could only participate to 3 games at a time. This limit could be increased to 5 once the original 3 games will have ended and could be removed once the second wave of 5 games is over, provided that no game have been abandoned meanwhile.
  • The ranking system(s) should be broadly understood by each member of the community - As these would be the tools to make informed choices, then an effort should be done to make them clear (for example in the welcome e-mail when you first register, in the FAQ, etc... It's not about obliging everybody to agree with them, but to make sure members of the community really understand what the ranking system(s) measure and why, and avoid a situation like the present one: where some players is seriously convinced that more D-points mean better skills.
  • A statistical analysis of the community as such should be provided - This for me is essential for a member to understand the community, but also for a player to assess their own skills, strengths and weaknesses in comparison with the average / the top players / the newcomers. This analysis would also help people who are dedicated to the community and/or to the game of diplomacy to understand better the reality around them, and make more informed choices. For example... if the statistics would highlight as only 10% of the newly registered users bother to go beyond the third game... if you would see that newcomers are normally ranked X rather than Y... if you would see that there is a peak in the increase of performance after the Xth game... then you could [put here whatever appropriate].
  • Ranks could be clustered - On this I don't fully agree with myself, anyhow... this is the idea: in many contexts where the ELO system has been implemented, top players have become "picky" in choosing adversaries and focused on their ranks more than on the game. To tell it as EdiBersan did, they are now more focused on the "meta-game" rather than on the game. A way to prevent this would be to cluster the players in groups (for example the 10 medals) without disclosing the information on the actual individual position: a player would therefore know to rank "diplomat" as other 14 but wouldn't know if he is the highest or lowest ranking in the diplomat group...

Quote:
ACCURACY FACTORS - This relates to the use of the system to value different players. Although less important than the quality of the games, this being perfected is an important issue if achieved. We must remember that “perfect” means “best approximation” rather than actually “perfect”.

Let's not mix the goals (quality of the games, quality of the community) with the tools. A rating system is a tool, in our case: it's accuracy is very important for the same reason for which the quality of the food is important if your goal is healthy nutrition.

My list of "good traits", against created by modifying yours:
  • it is objective - Agree purely in the sense given in mathematical discussions on ranking systems. So I wouldn't attach too much hope on this term, meaning that whatever we perceive, say and do... (you know the story here!) ;)
  • it values every game equally - In principle I agree, but I might be also convinced of the contrary: would you say that you would value the final game in the intergalactic championship the same that the game you did last Sunday with your family members? I probably would still say "yes", but other members of the community might consider that a WTA values more than a PPSC, or that 1 hour turn period value more than 72 hours turn games, or... Again: you have my vote for "yes", but I think it is not mandatory.
  • the reward from any game is directly related to the skill of the opposition in that game - I do agree, this is - for me - part of the "objectivity" of the system.
  • it does not reward higher games played counts, so that newer players of equal ability have equal rating - Partly agree: it must not create a situation for which newcomers will be forever "behind", but it is a fact that the rating of a player with a dataset of 90 games is more reliable than the one with a player with 3. Glicko-2 and TrueSkill both account for this, considering the "rate deviance" or "rating confidence".
  • It should also consider how a novice begins at less than average skill, but rapidly gains skill with experience - Disagree. A new player should begin with a rating calculated on his own performance, not on the basis of an average.
  • Ideally, it will not give a shown rating to a new player, since that rating will not be necessarily accurate, this can apply to any system - Agree, this in fact makes superfluous your own previous point: a player will enter with his own rating as soon as the rating will be reliable enough, so no need to set a newcomer's rating below the average...
  • The system accounts for less volatility of rankings given to players with wide datasets - This is the reason for which in the ELO system, top players use a K of 16, while normal players use a K of 32... (yet another "magic number"...). In simple words it means that players whose ranking is based on a large amount of games would have smaller adjustment of their ratings than players with less games. Both Glicko-2 and TrueSkill have this characteristics embedded, while in the ELo system this has been injected ex-post by mean of the K.

Ok, so these were my reactions to your point. I would be pleased if we could co-operate on this! Surely I would very dismayed if all the work you did so far would go nowhere.


Top
 Profile  
 
PostPosted: Fri Oct 03, 2008 9:17 pm 
Offline

Joined: Tue Aug 26, 2008 8:46 pm
Posts: 249
High Quality Play: This for me refers to the idea that you should be aiming for a high win percent, not a lower win percent but high game count. The quality related to the point system is the maxim of effort, so no CD, consistent exchange and so on.

Responsible play: Agreed.

Players can choose mixed or same ability games: Yes, I was going to come onto that later, but the maxim is being able to choose between the to. Being able to distinguish the two is clearly important.

Limiting Game Number: Playing a large number of games is not of itself a bad thing, playing them badly is, so the emphasis should be on that. We can make it clear that the site encourages quality play and the ranking system does so also.

Broadly understood by each member: In short, yes. We should be able to explain proportionalities and principles if not the detailed mathematics of any system.

Ranks clustered: The idea here should be that people have no need to play a meta-game by choosing opponents, because the reward/risk is constant.





Objective: I mean mathematically. I am mathematical.

Equal value: The final game in the intergalactic championship is more important because the players are better, something already considered.

Reward: Good

Rewarding higher game counts. Basically what I am saying is that there isn't a tendency to go on gaining rank by playing forever at the same standard, which happens now. Somebody who plays less falls behind.

It should also consider how a novice begins at less than average skill, but rapidly gains skill with experience. On average, new players win just 9% of the pot. That is very little indeed. We need to start them reflecting that so that other players don't inflate their rank playing overrated newbies. However, we need to avoid inflation and deflation, so must feed the lost rank in in games to come. It is not related to overrating those players and so overvaluing them, but the consequence of that that other people become over rated.

The system accounts for less volatility of rankings given to players with wide datasets - yes, but with a cautionary note. We need to be careful about in- or de- flation, and we need to be careful about making players get stuck after playing a certain number of games. That means, we should make sure that real progress can still be made.


Top
 Profile  
 
PostPosted: Tue Oct 07, 2008 1:43 am 
Offline
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
TheGhostmaker wrote:
I am writing an objective review of things. This is the first section, which needs comment, criticism and concensus before I can really continue. Hopefully it will lead to a correct decision being makable and so made.

Okay, but first I'd like to request you both cut the crap on the public forum. How TGM can promote his system while not responding to a post filled with the system's issues is beyond me. I feel like I've wasted a whole load of time going over the last Elo design spec because now it looks like it's being redesigned completely, or that something new is coming along.

So that everyone knows what a replacement system has to handle let's take a look at what the points system was designed to do and why:
Attachment:
detail-graph.png
detail-graph.png [ 6.69 KiB | Viewed 16352 times ]

The red line is the period after 0.75 was released; a massive number of users joined, an order of magnitude more than the software had dealt with before. After everyone initially joined the 0.72 community was totally swamped, and went from an everyone-knows-everyone atmosphere to a sea of random players. With no easy way to know who was a dedicated player and who was likely to quit there was no way to have a public game which you could expect to be CD free. People would join a game, hope to get a good start, and leave if it didn't go quite to plan.
There were also players who would join dozens of games at a time, then have to be somewhere and have to leave them all (and play terribly in all those games anyway). Also players without an objective or goal would lose interest; after playing a game players wanted to feel like they had earned something for their success. Finally there was no real community, no way to tell who were the community members and who were the newcomers, and this means there's no loyalty/desire to stick with the site.

The end result is no-one has fun playing on phpdiplomacy.net, and the initial burst of activity starts to deflate over 6 months because the active users don't translate into a community of players, but just a bunch of anonymous people bumping into each other in an online game.

The points system was designed to fix the above problems, and it was entered in towards the end of August where the blue arrow is. Nothing else changed at the time; the sudden increase in activity and formation of a lasting community happened as players realized they now had to stick with games to progress, that it was worth fighting for every SC you could grab, and that if you joined loads of games you'd play bad and end up losing overall.

Since then the new adjudicator helped push the rate of increase up, the improved search engine optimization helped, as did the Facebook Diplomacy hookup, but it's the points system which first started building the site that we have today

There are a few points I'd like to take from this:
  • It's not a bad system, it does what I designed it to do and then some, probably in less code than the characters which make up this post so far. Any contending systems have their work cut out for them
  • If this was just a ranking system I wouldn't care; however the majority want to rank themselves would be fine with me, I'm not going to be #1 in any ranking scheme. The problem is the ranking system is just a part of it, and it's closely tied up with things I do care about which keep the site manageable.
  • The functionality the current system provides is important, as you can see from before/after points system change
  • It's dangerous to change the points system, since it's what ties the community together, so I'm going to be very thorough when vetting new systems. I won't bet the project on "math says so"

Here are a list of the things the points system rewards and punishes.
Rewarded:
- Improving at all from a CD player's bad position (taking over CD players costs as much as their SCs are worth)
- Mingling with lots of players (Points originate from the low players with few points, you have to be part of a chain that reaches new players, players can't organize themselves into isolated groups
- Playing regularly (Because the total amount of points going into the system only goes up there is "inflation" of points, i.e. points become worth less as time goes on, this means everyone is getting "rewarded" on average, no-one is just staying in the same spot. This results in a sort of "race" instead of someone going up at the expense of someone getting dragged down)
- Sticking to a single account (Your starting position is the worst possible position; the only incentive to create a new account is to multi-account, no-one will be scoring negatively in their account and simply start a new one. The incentive to multi-account is also diminished because points scores increase exponentially, so to get a multi-account higher up by sacrificing other accounts you'd need a whole pyramid of multi-accounters beneath the one you want to win, it gets impractical fast and the negative effects are diminished)
- Surviving (PPSC encourages newcomers to stick to the end, and form alliances. Strong alliances and draws are rewarded in low skilled games, but incentives for newcomers to learn to talk is a positive thing)
- Winning (WTA fits on nicely to the points system, so that both PPSC and WTA can exist using the same points)
- Joining the right number of games (Players are encouraged to join as many games as they can play well in, but discouraged from joining any more than that)

Punished:
- Civil disorder (The points they invested in the game are also passed on to the other players in their game, which means dedicated players get pushed up at the expense of civil disorder players even faster than they otherwise would)
- Resting on your laurels (If someone is at the top and by playing can only go down, the system is rewarding people who leave. Because of inflation no-one can reach the top and then leave. However it doesn't get out of hand either; if Rait hadn't played at all since the points system started he'd still be in the top 15 after over a year, hardly out of control or forcing players to be tied to phpDiplomacy 24/7 like a slave. Also remember that a player will always be able to catch up to their skill level with relative ease however inflated it gets. This is why so many have now made it to Rait's old #1 3000 points position with fewer games won)

Some design principles:
- It's simple (Not only the core idea of points/pots/bets is simple, but crucially the implementation is simple too. An idea is worthless if it'd be too difficult/time-consuming to implement, or would be demanding on the server)
- No magic numbers (These just open room for more tiresome debate, and probably need to be maintained/tweaked
- All-in-one (It's one integrated, easy to understand parameter, not a whole load of small systems)

General objectives/aims:
- It needs to encourage activity; I like seeing the code used by lots of people, and having database optimization, security, concurrency, and user interface issues truly tested. A lifeless list of people either trying to get themselves into the most favorable games or having left after made it to the top doesn't sound like an improvment
- It needs to be open and inviting even to newcomers. This was one of the main design goals of phpDiplomacy which affected loads of decisions, and a pro-oriented ranking scheme wouldn't fit in with phpDiplomacy's attempt to be able to host a wide range of skill together as well as possible
- It should be enjoyable to compete in, phpDiplomacy is about hosting an active community and fun diplomacy games (preferably for a wide definition of "fun diplomacy game", from easy to hard etc), it isn't about hosting a ranking list
- All games shouldn't be equal; players should be able to play an "important" game which they play seriously and try hard in, and at the same time be able to play practice/off-beat games where they try a variant ruleset (e.g. gunboat), a different tactic, teach someone to play, etc


Specification info:
- A design spec is a doc containing a summary of all the info needed to turn the system into code, leaving no ambiguity. It has to be something a developer can take and fully analyze and code. Think of each of the places where the rating system comes into play, and figure out the inputs, outputs, formulas and/or algorithms which would be used in that place
- Even if I end up approving a design spec I won't be implementing it. The points system does everything I want it to do and has since Aug 07
- I won't look at and figure out problems with a design spec just to be presented with another until it works


Hope that helps


Last edited by kestasjk on Wed Oct 08, 2008 6:12 pm, edited 1 time in total.
"Joining the right number of games", "All games shouldn't be equal"


Top
 Profile  
 
PostPosted: Tue Oct 07, 2008 6:46 am 
Offline

Joined: Tue Aug 26, 2008 8:46 pm
Posts: 249
Thank you.

I stopped responding to the other thread because it was clear I hadn't a full idea of what you wanted, and had a vast amout of schoolwork.

Now I can look at this as a benchmark, and decide what needs to be done to make Elo work for phpdiplomacy.


Top
 Profile  
 
PostPosted: Tue Oct 07, 2008 7:04 am 
Offline

Joined: Sat Sep 27, 2008 12:00 am
Posts: 44
Thanks kestas for having taken the time to interact on this. A few reactions...

kestasjk wrote:
Hope that helps

Tremendously, as that gives us something the possibility to rephrase and complete our proposals/ideas in a way that follows your logic/priorities, thus resulting in something more easily evaluable from your side. As specified in the other forum, I am a bit strict with timing during the next few days (lots of travelling around the globe) but I will for sure respond extensively ASAP.

Quote:
Even if I end up approving a design spec I won't be implementing it. The points system does everything I want it to do and has since Aug 07

You made this already clear time ago. You know my position: I might disagree on the wisdom on this choice absolutely speaking, but I think it is a legitimate, full right of yours.

Quote:
I won't look at and figure out problems with a design spec just to be presented with another until it works

This is not clear to me. What do you you mean? Does it mean that you want to receive only one proposal that you might "approve" but not implement? Or does it means that whatever specs one would submit to you will eventually get approved without you reading them? Or... ? Clarification needed.

Quote:
Okay, but first I'd like to request you both cut the crap on the public forum.

This is a very subjective evaluation of community members contributions. As far as I am concerned, I am more interested in helping people broaden their horizons of understanding and consequently enrich mines by interacting with them, rather than writing specs that could be approved but never implemented. I am not saying that I will not might take the time to eventually write them (maybe yes, maybe not... however they won't be implemented so, it does not change much, anyhow). What I am saying is that I don't see why I should stop debating pro's and con's of any idea (mine or others') with other users. I do not believe that hurts, especially not in the community of a game where communication and influencing skills are the pivotal element for one's victory.

If there is something particular that annoys you in TheGhostmaker or my interventions in the public forum, I would be glad to receive a more structured feedback.

Thanks a lot again for your input though. I will respond to its content ASAP! :)


Top
 Profile  
 
PostPosted: Tue Oct 07, 2008 5:11 pm 
Offline

Joined: Tue Aug 26, 2008 8:46 pm
Posts: 249
I would just like to say that the Elo system can be made to follow those criterion. I am less sure that it can be made to do so in a way that people understand, so that they actually do follow them.


Top
 Profile  
 
PostPosted: Wed Oct 08, 2008 6:24 pm 
Offline
Site Admin

Joined: Sat Jun 28, 2008 6:24 am
Posts: 892
mac wrote:
Quote:
I won't look at and figure out problems with a design spec just to be presented with another until it works

This is not clear to me. What do you you mean? Does it mean that you want to receive only one proposal that you might "approve" but not implement? Or does it means that whatever specs one would submit to you will eventually get approved without you reading them? Or... ? Clarification needed.

Well having made an effort to understand TGM's Elo system in depth and provide a full feedback now it looks like another one is in the works. I can understand that this time around this was because I hadn't provided a comparison feature set, but in future a cycle of "Post a spec, get in-depth analysis and list of problems, post a new spec, repeat until no problems" won't be an acceptable way to reach a new system, I'm not patient enough / don't have enough time

mac wrote:
If there is something particular that annoys you in TheGhostmaker or my interventions in the public forum, I would be glad to receive a more structured feedback.

Declaring that your system = true / mathematically provably superior based on your definition of superiority, or in TGM's case saying that the current system ought to be scrapped without response to the alternative's problems, are what I was responding to there. Constructive debate is fine, but declaring victory and rallying support for an unfinished system isn't


Top
 Profile  
 
PostPosted: Wed Oct 08, 2008 7:14 pm 
Offline

Joined: Tue Aug 26, 2008 8:46 pm
Posts: 249
I might say that that was in no way serious. Nor is that my position.


Top
 Profile  
 
PostPosted: Wed Oct 22, 2008 12:50 am 
Offline

Joined: Sat Sep 27, 2008 12:00 am
Posts: 44
Thanks Kestas for the answers. I read them only now as I stupidly did not noticed I had been logged off, so I did not realised new msgs were there to be read. :oops:


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group