## GRADING ANOMALIES

General discussions about grading.
Roger de Coverly
Posts: 19453
Joined: Tue Apr 15, 2008 2:51 pm

### Re: GRADING ANOMALIES

The question is can one find a better rule (or formulae) which would result in a grading system which would need less or no adjustment as seasons go by?
It's my opinion that the ECF are completely in the wrong to break the historic series of grades. By all means review such ad-hoc factors as the the 30 game rule, the 40 point rule, the junior increments, the estimation of new players and the treatment of rapidly improving players, but do it by parallel running against the established system so that you know the magnitude and effect of each proposed change.

If you want to abandon the long standing principle of equal grade for equal play, then just move sideways onto an Elo system rather than perpetuate another national system.

Robert Jurjevic
Posts: 207
Joined: Wed May 16, 2007 1:31 pm
Location: Surrey

### Re: GRADING ANOMALIES

Ã‰GS2 vs GS

(In each example Ã‰GS2 calculation was performed as if all the grades were equally trusted.)

1) Let us assume that there are two player pools, one with 365 players graded 125 and other with 365 players graded 150. Let us assume that each player from 125 pool plays each player from 150 pool (there are 133,225 games in total) and that all games are drawn. According to GS after calculating new grades 125 pool will become a 150 pool and a 150 pool will become a 125 pool. According to Ã‰GS2 after calculating new grades both pools will become 138 pools.

2) Let us assume that there are two player pools, one with 365 players graded 120 and other with 365 players graded 100. Let us assume that each season each player from 120 pool plays each player from 100 pool (there are 133,225 games in total) and that all games are drawn. According to GS after calculating new grades the 120 pool will become a 100 pool and a 100 pool will become a 120 pool, and the grades will keep alternating between the pools as the seasons pass. According to Ã‰GS2 after calculating new grades both pools will become 110 pools, which would then stay as such in each subsequent season.

3) Let us assume that there are two player pools, both with 365 players graded 100. Let us assume that each player from one pool plays each player from other pool (there are 133,225 games in total) and that all games are won by players of one pool. According to GS after calculating new grades the winning pool will become a 150 pool and the losing pool will become a 50 pool. According to Ã‰GS2 after calculating new grades the winning pool will become a 125 pool and a losing pool will become a 75 pool.

My thoughts on why in my opinion Ã‰GS2 results make more sense than GS results in each example:

1) When you look at the example from a perspective of a single player from the 125 pool one may think, the player scored 50% against a pool of 150 players, therefore he must be a 150 player, but if you look at the example from the pool perspective it looks to me more plausible to assume that after calculating new grades the pool grades should become equal, as both pools scored 50%. This implies that when looking at the example from a perspective of a single 125 player one should not assume that the grade of the 150 pool is fixed and that therefore if the 125 player scored 50% against it he must be a 150 player, you have seen that from pool perspective in this example the 125 player was in fact playing a degrading pool of players whose average grade dropped from 150 to 138, therefore in my opinion it would be fair to assume that if the 125 player scored 50% against the pool he is a 138 rather than a 150 player.

2) This is a version of the "lighthouse keeper" example which most of you would agree demonstrates a known theoretical disadvantage of GS.

3) If we neglect the 40 point rule in GS a minimum grade difference required for a performance of 100% is 50, so it would seem that new pool grades calculated by GS are too far apart, i.e., the grades were unnecessarily stretched.
Robert Jurjevic
Vafra

Roger de Coverly
Posts: 19453
Joined: Tue Apr 15, 2008 2:51 pm

### Re: GRADING ANOMALIES

Let us assume that each player from 125 pool plays each player from 150 pool (there are 133,225 games in total) and that all games are drawn.
I don't think you should evaluate grading systems based on 1 in a billion events.

The everyday point is that players improve from 125 to 150 or 150 to 175 every season and a grading system should have a means of revaluing them. Here's a closer to home example. If you play in the top section of weekend tournaments and score around 50%, then you usually get a grading performance of 175 ish. So if a player steps up from playing in the rated restricted tournaments and plays in the opens and scores 50% then for a suitably large number of games he should get a grade of 175. That's the same grade has someone who perpetually scores around that mark and the same grade as last season's 200 player who had a relatively bad year. It's also the same grade as a new player scoring 50% who hasn't previously played in the English system

I would rather assume that one player from the 125 pool plays 30 players from the 150 pool and scores 50%. Also one player from the 150 pool plays 30 games against the 125 pool and scores 50%. On the ECF system we have one transfer from the 125 pool to the 150 pool and one from the 150 pool to the 125 pool. I see nothing wrong with this and no reason to slow down the transition unless it was done as part of a package of changes including adopting of an Elo based system.

Roger de Coverly
Posts: 19453
Joined: Tue Apr 15, 2008 2:51 pm

### Re: GRADING ANOMALIES

Just another thought on averaging.

If in the ECF system you increased the minimum games cutoff from 30 to 60 and if you assumed everybody played exactly 30 games, then the example of a 125 player who upgrades to playing 150 opposition and scoring 50% would get a first season new grade of 138 exactly as in this EGS2 thing. So the ECF system already has some of the averaging. With the current 30 game cutoff, exact averaging applies at 15 games.

Robert Jurjevic
Posts: 207
Joined: Wed May 16, 2007 1:31 pm
Location: Surrey

### Re: GRADING ANOMALIES

Roger de Coverly wrote:I don't think you should evaluate grading systems based on 1 in a billion events.
Extreme cases may help in showing the differences between the systems. Say, you would not compare Eninsten's to Newton's theory on experiments in which both theories give virtually the same results, but would rather search for the extreme cases in which theories predict measurably different results.
Roger de Coverly wrote:The everyday point is that players improve from 125 to 150 or 150 to 175 every season and a grading system should have a means of revaluing them.
If a 125 player scores 75% against a field of players of 150, according to Ã‰GS2 (assuming that all grades are equally trusted), the 125 player would improve to 150 player, so Ã‰GS2 would allow for it, but it would require a performance of 75% rather than 50%.
Roger de Coverly wrote:I would rather assume that one player from the 125 pool plays 30 players from the 150 pool and scores 50%. Also one player from the 150 pool plays 30 games against the 125 pool and scores 50%. On the ECF system we have one transfer from the 125 pool to the 150 pool and one from the 150 pool to the 125 pool. I see nothing wrong with this and no reason to slow down the transition unless it was done as part of a package of changes including adopting of an Elo based system.
If we assume that the new grades of all other players in the pools remain unchanged (i.e., all other 150 players remain 150 and all other 125 players remain 125) then indeed your argument holds and the transition should be fair. Ã‰GS2 would require from the 125 player to score 75% and from the 150 player to score 25% in order for the transition to take place, when it looks like 50% for both players should be enough.

Actually, even if the 150 pool grade drops and the 125 pool grade raises the 50% should be a fair performance for the transition to take place if we do not wish to correct for grade change during a course of the season.

Maybe Ã‰GS3 system, where 'ka' and 'kb' are doubled in comparison to 'ka' and 'kb' in Ã‰GS2, should be considered. Ã‰GS3 is similar to GS except it uses logistic curve for 'p = f(d)' and fine-tunes grade change based on how much the grades are trusted (the more grade is trusted the less it changes and vice versa, that is a simple emulation of the difference between Glicko and Ã‰lo system).

Code: Select all

``````(* EGS3 *)
ClearAll[a, b, a2, b2, d, g, ka, kb, p, q, na, nb];
na = 30; nb = 30;
a = 125; b = 150;
q = 50;
d = a - b;
g = (25*Log)/Log;
ka = If[na + nb > 0, 2*nb/(na + nb), 1]; kb =
If[na + nb > 0, 2*na/(na + nb), 1];
p = 100/(1 + 10^(-d/g));
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]
``````
Robert Jurjevic
Vafra

Roger de Coverly
Posts: 19453
Joined: Tue Apr 15, 2008 2:51 pm

### Re: GRADING ANOMALIES

If we assume that the new grades of all other players in the pools remain unchanged (i.e., all other 150 players remain 150 and all other 125 players remain 125) then indeed your argument holds and the transition should be fair.
At the serious risk of repeating myself, the ECF (and usually Elo systems) always base performance on the published grade at the start of a period. So there is no need to make any assumption about the performance of "other" players Thus once a player is 150 in a published list, he stays at that level for all opponents until the next published list whatever his performance over the rating period. The frequency of the published list is thus one of the parameters (as we have seen with the recent FIDE discussion) Another one of the parameters of a grading system is the treatment of new players. Some systems just ignore games against unrated players. There may be a case for a rule change in the ECF system to ignore previous published grades and treat rapidly improving players as new players.

Robert Jurjevic
Posts: 207
Joined: Wed May 16, 2007 1:31 pm
Location: Surrey

### Re: GRADING ANOMALIES

Roger de Coverly wrote:If in the ECF system you increased the minimum games cutoff from 30 to 60 and if you assumed everybody played exactly 30 games, then the example of a 125 player who upgrades to playing 150 opposition and scoring 50% would get a first season new grade of 138 exactly as in this EGS2 thing. So the ECF system already has some of the averaging. With the current 30 game cutoff, exact averaging applies at 15 games.
It looks like the same would hold with the current limit of 30 games if you assumed that everybody played exactly 15 games, then the example of a 125 player who upgrades to playing 150 opposition and scoring 50% would get a first season new grade of 138, exactly as in this EGS2 thing. This is good in my opinion, as one would wish to change grades more rapidly only if at least a minimum games were played in the season. If you required the minimum games cutoff of 60 then you would either slow down the grade change (if the number of games played remained the same) or the grade change would remain unchanged (if the number of games played was doubled) but the accuracy would increase, as the result would be based on a larger statistical sample.
Last edited by Robert Jurjevic on Fri May 22, 2009 2:21 pm, edited 1 time in total.
Robert Jurjevic
Vafra

Robert Jurjevic
Posts: 207
Joined: Wed May 16, 2007 1:31 pm
Location: Surrey

### Re: GRADING ANOMALIES

Roger de Coverly wrote:
If we assume that the new grades of all other players in the pools remain unchanged (i.e., all other 150 players remain 150 and all other 125 players remain 125) then indeed your argument holds and the transition should be fair.
At the serious risk of repeating myself, the ECF (and usually Elo systems) always base performance on the published grade at the start of a period. So there is no need to make any assumption about the performance of "other" players Thus once a player is 150 in a published list, he stays at that level for all opponents until the next published list whatever his performance over the rating period. The frequency of the published list is thus one of the parameters (as we have seen with the recent FIDE discussion).
Completely agree with you. Roger de Coverly wrote:Another one of the parameters of a grading system is the treatment of new players. Some systems just ignore games against unrated players. There may be a case for a rule change in the ECF system to ignore previous published grades and treat rapidly improving players as new players.
Ã‰GS3 (well I shell say Ã‰GS3 not Ã‰GS2 now), when calculating grades of established players, would ignore all the games they played against ungraded players, while when calculating grades of ungraded players, all of the games will be taken into account. Actually Ã‰GS3 will do this correction based on how much one's grade is trusted for all games (say if a 125 player played only 2 games last season and drew against a 150 player who played 60 games last season, the 125 player would be almost maximally reworded while the 150 player would be hardly penalized, this is basically an idea of Professor Mark Glickman, an inventor of Glicko and Glicko2 grading systems), which could be a good thing, I think (please look at the formulae below).

The problem of players whose grade rapidly changes is another issue which is addressed in Glicko 2 system, but I haven't found a simple solution for the ECF grading system (yet). Ã‰GS3 (Ã‰lo Grading System three) formulae:

Let 'a' and 'b' are the grades of players 'A' and 'B', 'p' expected performance of player 'A' (expected performance of player 'B' is then '100 - p'), 'q' actual performance of player 'A' (actual performance of player 'B' is then '100 - q') and 'd = a - b' the grade difference, 'na' and 'nb' the number of games players 'A' and 'B' played in the season for which grades 'a' and 'b' were calculated. Then, new grades of players 'A' and 'B', 'a2' and 'b2', are calculated as follows:

Code: Select all

``````(* EGS3 *)
ClearAll[a, b, a2, b2, d, g, ka, kb, p, q, na, nb];
na = 30; nb = 30;
a = 125; b = 150;
q = 50;
d = a - b;
g = (25*Log)/Log;
ka = If[na + nb > 0, 2*nb/(na + nb), 1]; kb =
If[na + nb > 0, 2*na/(na + nb), 1];
p = 100/(1 + 10^(-d/g));
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]
``````
(if players 'A' and 'B' had played only one game in the season 'q' is either 100, 0 or 50, if they played more than one game it can be a number between 0 and 100 inclusively, 'Log[z]' gives the natural logarithm of 'z' (logarithm to base 'e'), 'x^y' gives 'x' to the power 'y', input parameters for GS are: 'a', 'b' and 'q', input parameters for Ã‰GS2 are: 'a', 'b', 'na', 'nb' and 'q', output parameters are: 'a2' and 'b2')

Note: The formulae are used to calculate a new grade of player 'A' for every opponent 'B' he or she played in the season. At the end of the season an average of the calculated grades (for every opponent 'B') is taken, and this average is a new player's 'A' grade for the season (for GS, if a player has not played enough games in the season, games from previous season or seasons will be taken into calculation, for Ã‰GS3 no games from previous season or seasons need to be taken into account).
Robert Jurjevic
Vafra

Roger de Coverly
Posts: 19453
Joined: Tue Apr 15, 2008 2:51 pm

### Re: GRADING ANOMALIES

say if a 125 player played only 2 games last season and drew against a 150 player who played 60 games last season, the 125 player would be almost maximally reworded while the 150 player would be hardly penalized
If we assume the 125 was based on 30 games prior season, then the effect of 2 drawn games against 150 players in the ECF system is

(125 * 28 + 150 * 2 )/30 = 127 (rounded)

So a 2 point gain. The 150 player is almost unaffected.

If the 125 was only based on say 9 games (the minimum to qualify for publication) then (125 *9 + 150 * 2) /11 = 130 rounded

So it's a bigger change but the grading system is making an inference that a player who draws 2 games from 11 against 150 opposition is a bit "better" than 125.

Some authors, notably EM White think this lack of symmetry is a cause of inflation or deflation or spread.

I see you mention the Glicko system. From what I've heard , one of its underlying premises ( that rating reliability reduces with inactivity ) is complete rubbish when applied to the top players in a national rating system. So if you are a seasoned international player but you only play one event a year in a particular country then you get an immensely high k factor for that event as you are incorrectly assumed to be inactive for the rest of the year.

Robert Jurjevic
Posts: 207
Joined: Wed May 16, 2007 1:31 pm
Location: Surrey

### Re: GRADING ANOMALIES

Roger de Coverly wrote:
say if a 125 player played only 2 games last season and drew against a 150 player who played 60 games last season, the 125 player would be almost maximally reworded while the 150 player would be hardly penalized
If we assume the 125 was based on 30 games prior season, then the effect of 2 drawn games against 150 players in the ECF system is

(125 * 28 + 150 * 2 )/30 = 127 (rounded)

So a 2 point gain. The 150 player is almost unaffected.

If the 125 was only based on say 9 games (the minimum to qualify for publication) then (125 *9 + 150 * 2) /11 = 130 rounded

So it's a bigger change but the grading system is making an inference that a player who draws 2 games from 11 against 150 opposition is a bit "better" than 125.

Some authors, notably EM White think this lack of symmetry is a cause of inflation or deflation or spread.
Right, Ã‰GS3 does not preserve the total grade in the system.

If a 125 player draws against the 150 player pool, then the three characteristic cases are:

Ã‰GS3:

1) the 125 player is ungraded and none of the pool players is ungraded, then a new grade of the 125 player is 175 (the pool grade is unaffected)

2) the 125 player's grade and the grades of all of his opponents from the pool are equally trusted, then a new grade of the 125 player is 150 (the pool grade is moderately affected)

3) the 125 player is not ungraded and all of the pool players are ungraded, then a new grade of the 125 player is 125 (the pool grade is maximally affected)

Ã‰GS2:

1) the 125 player is ungraded and none of the pool players is ungraded, then a new grade of the 125 player is 150 (the pool grade is unaffected)

2) the 125 player's grade and the grades of all of his opponents from the pool are equally trusted, then a new grade of the 125 player is 138 (the pool grade is moderately affected)

3) the 125 player is not ungraded and all of the pool players are ungraded, then a new grade of the 125 player is 125 (the pool grade is maximally affected)

Do not know which results, Ã‰GS3 or Ã‰GS2, make more sense, maybe both are unacceptable.

If you can find a plausible explanation for the Ã‰GS2's case 2 above, then Ã‰GS2 would have been a system which would make the most sense (all other systems fail on one or more other examples)...
...I fell that Ã‰GS2 is the best!

(Why should we stick to "equal grade for equal performance", if we decide to abandon it we would be able to adopt Ã‰GS2 which behaves much better in all other cases than any of the mentioned systems. Note that we still stick to "equal grade for equal performance" if the player in question is ungraded and his opponent not ungraded.)
Roger de Coverly wrote:I see you mention the Glicko system. From what I've heard , one of its underlying premises ( that rating reliability reduces with inactivity ) is complete rubbish when applied to the top players in a national rating system. So if you are a seasoned international player but you only play one event a year in a particular country then you get an immensely high k factor for that event as you are incorrectly assumed to be inactive for the rest of the year.
Right, in exceptional cases like these one would have to set 'ka = kb = 1' in Ã‰GS3 and 'ka = kb = 1/2' in Ã‰GS2 overriding the formulae results.

By the way, Glicko (i.e., Glicko 1) is used on FICS (Free Internet Chess Server) for quite a few years now, there is so called RD factor:

RD stands for ratings deviation. RD is a statistical measure of how stable your rating is. For the most part, RD goes down when you play a lot of games and goes up if you play infrequently. RD is used in the formula that adjusts your rating after a chess match; lower RD values mean that your rating will not change as much. RD values are listed in finger displays of ratings, in match challenges and in assess information. For further details about RD and how it is used in ratings, read "help glicko".

My simple emulation of Glicko's RD is done in 'ka' and 'kb' factors based of the number of games played in the last season, 'na' and 'nb':

Ã‰GS3:

Code: Select all

``````ka = If[na + nb > 0, 2*nb/(na + nb), 1]; kb =
If[na + nb > 0, 2*na/(na + nb), 1];
``````
Ã‰GS2:

Code: Select all

``````ka = If[na + nb > 0, nb/(na + nb), 1/2]; kb =
If[na + nb > 0, na/(na + nb), 1/2];
``````
Cautious...

A fairly safe change could be to switch from GS to Ã‰GS4, where Ã‰GS4 is GS with 'p = f(d)' replaced by the logistic curve.

The only difference between Ã‰GS4 and Ã‰GS is in definition of factors 'ka' and 'kb':

Ã‰GS4:

Code: Select all

``````ka = 1; kb = 1;
``````
Ã‰GS:

Code: Select all

``````ka = 1/2; kb = 1/2;
``````
Ã‰GS4 (Ã‰lo Grading System four) formulae:

Let 'a' and 'b' are the grades of players 'A' and 'B', 'p' expected performance of player 'A' (expected performance of player 'B' is then '100 - p'), 'q' actual performance of player 'A' (actual performance of player 'B' is then '100 - q') and 'd = a - b' the grade difference. Then, new grades of players 'A' and 'B', 'a2' and 'b2', are calculated as follows:

Code: Select all

``````(* EGS4 *)
ClearAll[a, b, a2, b2, d, g, ka, kb, p, q];
a = 125; b = 150;
q = 50;
d = a - b;
g = (25*Log)/Log;
ka = 1; kb = 1;
p = 100/(1 + 10^(-d/g));
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]
``````
(if players 'A' and 'B' had played only one game in the season 'q' is either 100, 0 or 50, if they played more than one game it can be a number between 0 and 100 inclusively, 'Log[z]' gives the natural logarithm of 'z' (logarithm to base 'e'), 'x^y' gives 'x' to the power 'y', input parameters for GS and Ã‰GS2 are: 'a', 'b' and 'q', output parameters are: 'a2' and 'b2')

Note: The formulae are used to calculate a new grade of player 'A' for every opponent 'B' he or she played in the season. At the end of the season an average of the calculated grades (for every opponent 'B') is taken, and this average is a new player's 'A' grade for the season (for GS and Ã‰GS4, if a player has not played enough games in the season, games from previous season or seasons will be taken into calculation).
Robert Jurjevic
Vafra

Robert Jurjevic
Posts: 207
Joined: Wed May 16, 2007 1:31 pm
Location: Surrey

### Re: GRADING ANOMALIES

I think that I know why GS is stretching the grades!  [version 12/06/2009 2.1]

The point...

My point in a nutshell is that the 'k' factors (wrongly chosen) in the current grading system are causing the grade stretching and that the amount of stretch (due to the 'k' factors) is larger (in fact it is equal to '|p - q|') than grade fluctuations caused by other anomalies which may be corrected by using FIDE logistic relation for 'p = f(d)', Glickman idea on changing less trusted grades (based on frequency of play) faster than more trusted grades, or even a solution to the "junior problem".

The main flaw (rule-wise) in the current grading system is that it applies the current grading rule for changing the grades of both players in the game (should apply it to change the grade of one of the players only, or use a different 'good' rule applying it to both players).

The main flaw (formulae-wise) in the current grading system is that it uses 'ka = kb = 1' in their formulae (should use 'ka = kb = 1/2' or variable 'ka' and 'kb' such that 'ka + kb = 1').

Factors 'k' and grade stretching...

The three relationships 'p = f(d)' (green, blue and red line) match pretty closely for '-30 <= d <= 30' and all predict that for grade difference 'd' of 30 grading points expected performance 'p' is approximately 80% (actually the relationships marked with green and blue line expect 80.0000% and with red line 78.8905%) (please see figure 1 below). Figure 1: Relationship between expected performance 'p' and grade difference 'd' as defined in GS (green line), CGS, AGS and AGS2 (blue line) and Ã‰GS, Ã‰GS2, Ã‰GS3 and Ã‰GS4 (red line). Expected performance 'p' is a function of grade difference 'd', i.e., 'p = f(d)'.

Let us assume that two pool of players both with average grade of 100 play each other during a course of a season and that one of the player pools scores 80%. Let us assume that each player in the pool playes only players of the other pool and that each player played exactly 30 games (for simplicity we can assume that in each pool there are 30 players each graded 100 and that each player from one pool plays each player from other pool, totaling in 900 games). Then, it follows (from the relationships 'p = f(d)') that at the end of the season one of the player pools should be regarded stronger (than the other) for approximately 30 grading points (because it scored 30% more).

(Note that it is unlikely that one of the pools would score so high in practice, though in order to please those who might be troubled with that, we could assume that, say, players of the well performing pool are all juniors who had been lucky enough to be coached by Garry Kasparov in the summer break before the start of the season.)

According to GS new grades of the player pools in the above example are 130 and 70 (the pool grades drift apart for '130 - 70 = 60' grading points).

According to Ã‰GS new grades of the player pools in the above example are 115 and 85 (the pool grades drift apart for '115 - 85 = 30' grading points)..

As the grade drifts are '115 - 85 = 30' and '130 - 70 = 60' it is obvious that (current grading system) GS stretched the grades for 30 grading points (the new grade difference calculated by GS is twice as big than it should have been)!

(You see how Ã‰GS is fair, it did not assign grades of 130 and 100, as it did not assume that the better pool improved and the other stayed as it was, but it guessed that the result was due to both one of the pools improving and other worsening, though if Kasparov really did coach the juniors in the better pool, the grades of 130 and 100 would have been a better guess. GS grades of 130 and 70 make no sense at all, as if the better pool was given 130 the other pool should have been given 100, not 70, giving 70 to other pool is as if the better pool had scored approximately 94%, according to Ã‰lo's logistic 'p = f(d)'.)

(Note that Ã‰GS2 would assign grades of 130 and 100 if all of the players in the better pool were ungraded and if all of the players in the other pool were graded, that is because Ã‰GS2 changes less trusted grades more rapidly than more trusted grades, and in this extreme case the grades of graded players remain unaffected by games played against ungraded players. Well, it would be nice if we could take into account if, say, Kasparov was coaching a player, but...)

In my opinion, the main reason for the grade stretching is factor 'k' which is twice as big in GS than Ã‰GS and Ã‰GS2 (please note that in the above example we eliminated the differences in 'p = f(d)', so 'p = f(d)' couldn't be a cause of the stretching).

Statement 1: If for a grading system (of so far mentioned) holds that 'ka + kb = 1' (in their formulae) the system neither stretches nor shrinks the grades, if 'ka + kb > 1' the system stretches the grades, and if 'ka + kb < 1' the system shrinks the grades.

Of so far mentioned systems GS, CGS, Ã‰GS3 and Ã‰GS4 stretch the grades, and AGS, AGS2, Ã‰GS and Ã‰GS2 neither stretch nor shrink the grades (none of the systems shrinks the grades).

"Equal grade for equal performance"...

Systems which neither stretch nor shrink the grades do not obey the rule which is known as "equal grade for equal performance", say if you have a 130 player who scores 50% against a pool of 160 players "equal grade for equal performance" rule requires that the 130 player becomes a 160 player (according to the systems which neither stretch nor shrink the grades the 130 player becomes approximately a 145 player).

So it would seem that one could opt either for a system which obeys "equal grade for equal performance" rule and stretches the grades or a system which does not obey "equal grade for equal performance" rule and neither stretches nor shrinks the grades.

Let us assume that in the above "equal grade for equal performance" example the 130 player played 300 games during a course of a season and that each player in the pool (there are 10 players in the pool) played 30 games against the 130 player. Then, taking into account only the 300 games the pool players played against the 130 player, it follows (from the relationships 'p = f(d)') that at the end of the season the 130 player should be regarded approximately equally strong as the pool of players he played.

"Equal grade for equal performance" rule requires that new grade of the 130 player is 160 (assuming that the pool grade stays approximately 160).

Taking into account only the 300 games the pool players played against the 130 player, a system which neither stretches nor shrinks the grades requires that new grade of both the 130 player and the pool is approximately 145.

If really all the pool payers performed at their level of 160 and the 130 player did improve, the 130 player should become a 160 player and the pool players should remain 160. The problem with the current grading system is that even it assigns 160 to the 130 player it panelizes the pool players for the games which they have played against the 130 players, what is causing the grade stretching.

A system which wouldn't stretch the grades, if assigning to the 130 player a grade of 160, when calculating the grades of the pool players, should ignore the games the pool players have played against the 130 player (as it is already assumed that they performed on 160 level and the games they have played against the 130 player should have no effect on their grade), or it can assume that both the pool players worsened and the 130 player improved (splinting it 50/50) and assigning to the 130 player a grade of 145, penalizing the pool players for their games against the 30 player (i.e., if the pool players played only the games against the 130 player the pool grade would have lowered to 145) not stretching the grades.

The above argument is enough for me to claim that "equal grade for equal performance" rule is unsound and should be abandoned in favour of a system which neither stretches nor shrinks the grades.

Factors 'k' and total system grade...

Statement 2: If for a grading system (of so far mentioned) holds that 'ka = kb' (in their formulae) the system preserves total system grade, and if 'ka /= kb' the system does not preserve total system grade.

Of so far mentioned systems GS, CGS, AGS, Ã‰GS and Ã‰GS4 preserve total system grade, and AGS2, Ã‰GS2 and Ã‰GS3 do not preserve total system grade.

In a nutshell...

In all mentioned grading systems there are two factors, 'ka' (factor 'k' for player A) and 'kb' (factor 'k' for player B), that are used in formulae which correct the grades based on the game results (i.e., based on the difference between actual and expected performance). In (the current grading system) GS (Grading System) both 'k' factors are equal to 1, in Ã‰GS (Ã‰lo Grading System) both 'k' factors are equal to 1/2 and in Ã‰GS2 (Ã‰lo Grading System two) each of the 'k' factors can be between 0 and 1 inclusively (the less the player is active relatively to the other player the closer is 'k' to 1) but their sum is always 1.

I have found that a necessary condition for a grading system not to stretch (nor shrink) the grades is that the sum of the two factors 'k' is 1 (if the sum is greater than 1 then the system stretches the grades and if the sum is less than 1 then the system shrinks the grades). As in GS both 'k' factors are equal to 1 their sum is 2 and the GS stretches the grades. As in both Ã‰GS and Ã‰GS2 the sum of 'k' factors is 1 they do not stretch (nor shrink) the grades.

Mathematical proof...

Let 'a' and 'b' are the grades of players 'A' and 'B', 'p' expected performance of player 'A' (expected performance of player 'B' is then '100 - p'), 'q' actual performance of player 'A' (actual performance of player 'B' is then '100 - q') and 'a2' and 'b2' new grades of players 'A' and 'B'.

'a2' and 'b2' are calculated using the following formulae (holds for any grading system mentioned here, including the current one):

Code: Select all

``````a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
``````
then we examine a term '(a2 - a) + (b - b2)' (which is a measure of how much the grades drift apart due to a difference in actual and expected performance 'q - p').

Mathematical requirement for a grading system (using the mentioned formulae) not to stretch nor shrink the grades is that '(a2 - a) + (b - b2)' is equal to 'q - p' for any 'a', 'b', 'p' and 'q'.

As

Code: Select all

``````(* grade stretching GS *)
ClearAll[a, b, a2, b2, d, g, s, ka, kb, p, q];
g = 50; s = 40;
d = a - b;
ka = 1; kb = 1;
If[d >= 0, If[d > s, p = 90, p = g*(1 + d/g)],
If[d < -s, p = 10, p = g*(1 + d/g)]];
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Simplify[(a2 - a) + (b - b2) == q - p]
``````
gives

Code: Select all

``````q == p
``````
i.e., '(a2 - a) + (b - b2)' is equal to 'q - p' only if 'p = q', GS either stretches or shrinks the grades (it can be shown that GS stretches the grades).

As

Code: Select all

``````(* grade stretching EGS2 *)
ClearAll[a, b, a2, b2, d, g, ka, kb, p, q, na, nb];
d = a - b;
g = (25*Log)/Log;
ka = If[na + nb > 0, nb/(na + nb), 1/2]; kb =
If[na + nb > 0, na/(na + nb), 1/2];
p = 100/(1 + 10^(-d/g));
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Simplify[(a2 - a) + (b - b2) == q - p, na + nb > 0]
Print[];
Simplify[(a2 - a) + (b - b2) == q - p, na + nb == 0]
``````
gives

Code: Select all

``````True
True
``````
i.e., '(a2 - a) + (b - b2)' is equal to 'q - p' for any 'a', 'b', 'p' and 'q', in both cases, 'na + nb > 0' and 'na + nb = 0', EGS2 neither stretches nor shrinks the grades.

Or in general, as

Code: Select all

``````(* grade stretching *)
ClearAll[a, b, a2, b2, ka, kb, p, q];
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Simplify[(a2 - a) + (b - b2) == q - p]
Print[];
Simplify[(a2 - a) + (b - b2) == q - p, ka + kb == 1]``````
gives

Code: Select all

``````0 == (-1 + ka + kb)*(p - q)
True
``````
i.e., '(a2 - a) + (b - b2)' is equal to 'q - p' for any 'a', 'b', 'p' and 'q', if and only if 'ka + kb = 1', a grading system neither stretches nor shrinks the grades if 'ka + kb = 1'.

AGS3...

The closest system to GS which does not stretch (nor shrink) the grades (due to 'k' factors) is AGS3.

Rule 2b: For a win you score average grade plus 25; for a draw, average grade; and for a loss, average grade minus 25. Average grade is half of the sum of your and your opponent's grade. Note that, if your opponent's grade differs from yours by more than 40 points, it is taken to be exactly 40 points above (or below) yours. At the end of the season an average of points-per-game is taken, and that is your new grade.

Ã‰GS5 and Ã‰GS6...

Ã‰GS and Ã‰GS2 use 'g = (25*Log)/Log = 52.3975...' in 'p = 100/(1 + 10^(-d/g))'. It can be shown that in order to match FIDE's choice of the constant in their logistic curve it should be 'g = 50' (Mr David Welch has found that 'g = (25*Log)/Log = 52.3975...' does not match FIDE's constant).

Therefore we introduce two new systems, Ã‰GS5 and Ã‰GS6, Ã‰GS5 is Ã‰GS with 'g = 50' and Ã‰GS6 is Ã‰GS2 with 'g = 50'. Figure 2: Relationship between expected performance 'p' and grade difference 'd' as defined in GS (green line), CGS, AGS and AGS2 (blue line), Ã‰GS, Ã‰GS2, Ã‰GS3 and Ã‰GS4 (red line), Ã‰GS5 and Ã‰GS6 (yellow line), and (normal relationship 'p = 100*(1 + Erf[d/50])/2', where the error function Erf[z] is the integral of the Gaussian distribution) as originally defined by Ã‰lo (brown line above yellow). Expected performance 'p' is a function of grade difference 'd', i.e., 'p = f(d)'. Note that both FIDE and USCF switched from normal (brown line) to logistic (yellow line) relationship 'p = f(d)' which they found provides a better fit for the actual results achieved.

It can be shown that 'g = (25*Log)/Log = 52.3975...' minimizes the difference between logistic and linear 'p = f(d)' approximately in the interval '0 <= d <= 34' and 'g = 50' approximately in the interval '0 <= d <= 41'.

Code: Select all

``````-------------------------------------------
p = f(d)
d   green   blue    red yellow  brown
-------------------------------------------
0    50.0   50.0   50.0   50.0   50.0
1    51.0   51.0   51.1   51.2   51.1
2    52.0   52.0   52.2   52.3   52.3
3    53.0   53.0   53.3   53.4   53.4
4    54.0   54.0   54.4   54.6   54.5
5    55.0   55.0   55.5   55.7   55.6
6    56.0   56.0   56.6   56.9   56.7
7    57.0   57.0   57.6   58.0   57.8
8    58.0   58.0   58.7   59.1   59.0
9    59.0   59.0   59.8   60.2   60.0
10    60.0   60.0   60.8   61.3   61.1
11    61.0   61.0   61.9   62.4   62.2
12    62.0   62.0   62.9   63.5   63.3
13    63.0   63.0   63.9   64.5   64.3
14    64.0   64.0   64.9   65.6   65.4
15    65.0   65.0   65.9   66.6   66.4
16    66.0   66.0   66.9   67.6   67.5
17    67.0   67.0   67.9   68.6   68.5
18    68.0   68.0   68.8   69.6   69.5
19    69.0   69.0   69.7   70.6   70.5
20    70.0   70.0   70.7   71.5   71.4
21    71.0   71.0   71.6   72.5   72.4
22    72.0   72.0   72.4   73.4   73.3
23    73.0   73.0   73.3   74.3   74.2
24    74.0   74.0   74.2   75.1   75.1
25    75.0   75.0   75.0   76.0   76.0
26    76.0   76.0   75.8   76.8   76.9
27    77.0   77.0   76.6   77.6   77.7
28    78.0   78.0   77.4   78.4   78.6
29    79.0   79.0   78.1   79.2   79.4
30    80.0   80.0   78.9   79.9   80.2
31    81.0   81.0   79.6   80.7   81.0
32    82.0   82.0   80.3   81.4   81.7
33    83.0   83.0   81.0   82.0   82.5
34    84.0   84.0   81.7   82.7   83.2
35    85.0   85.0   82.3   83.4   83.9
36    86.0   86.0   82.9   84.0   84.6
37    87.0   87.0   83.6   84.6   85.2
38    88.0   88.0   84.2   85.2   85.9
39    89.0   89.0   84.7   85.8   86.5
40    90.0   90.0   85.3   86.3   87.1
41    90.0   91.0   85.8   86.9   87.7
41    90.0   92.0   86.4   87.4   88.3
43    90.0   93.0   86.9   87.9   88.8
44    90.0   94.0   87.4   88.4   89.3
45    90.0   95.0   87.8   88.8   89.8
46    90.0   96.0   88.3   89.3   90.3
47    90.0   97.0   88.7   89.7   90.8
48    90.0   98.0   89.2   90.1   91.3
49    90.0   99.0   89.6   90.5   91.7
50    90.0  100.0   90.0   90.9   92.1
51    90.0  100.0   90.4   91.3   92.5
52    90.0  100.0   90.8   91.6   92.9
53    90.0  100.0   91.1   92.0   93.3
54    90.0  100.0   91.5   92.3   93.7
55    90.0  100.0   91.8   92.6   94.0
56    90.0  100.0   92.1   92.9   94.3
57    90.0  100.0   92.4   93.2   94.7
58    90.0  100.0   92.7   93.5   95.0
59    90.0  100.0   93.0   93.8   95.2
60    90.0  100.0   93.3   94.1   95.5
61    90.0  100.0   93.6   94.3   95.8
62    90.0  100.0   93.8   94.6   96.0
63    90.0  100.0   94.1   94.8   96.3
64    90.0  100.0   94.3   95.0   96.5
65    90.0  100.0   94.6   95.2   96.7
66    90.0  100.0   94.8   95.4   96.9
67    90.0  100.0   95.0   95.6   97.1
68    90.0  100.0   95.2   95.8   97.3
69    90.0  100.0   95.4   96.0   97.5
70    90.0  100.0   95.6   96.2   97.6
71    90.0  100.0   95.8   96.3   97.8
72    90.0  100.0   95.9   96.5   97.9
73    90.0  100.0   96.1   96.6   98.1
74    90.0  100.0   96.3   96.8   98.2
75    90.0  100.0   96.4   96.9   98.3
76    90.0  100.0   96.6   97.1   98.4
77    90.0  100.0   96.7   97.2   98.5
78    90.0  100.0   96.9   97.3   98.6
79    90.0  100.0   97.0   97.4   98.7
80    90.0  100.0   97.1   97.5   98.8
81    90.0  100.0   97.2   97.7   98.9
82    90.0  100.0   97.3   97.8   99.0
83    90.0  100.0   97.5   97.9   99.1
84    90.0  100.0   97.6   98.0   99.1
85    90.0  100.0   97.7   98.0   99.2
86    90.0  100.0   97.8   98.1   99.3
87    90.0  100.0   97.9   98.2   99.3
88    90.0  100.0   98.0   98.3   99.4
89    90.0  100.0   98.0   98.4   99.4
90    90.0  100.0   98.1   98.4   99.5
91    90.0  100.0   98.2   98.5   99.5
92    90.0  100.0   98.3   98.6   99.5
93    90.0  100.0   98.3   98.6   99.6
94    90.0  100.0   98.4   98.7   99.6
95    90.0  100.0   98.5   98.8   99.6
96    90.0  100.0   98.5   98.8   99.7
97    90.0  100.0   98.6   98.9   99.7
98    90.0  100.0   98.7   98.9   99.7
99    90.0  100.0   98.7   99.0   99.7
100    90.0  100.0   98.8   99.0   99.8
101    90.0  100.0   98.8   99.1   99.8
102    90.0  100.0   98.9   99.1   99.8
103    90.0  100.0   98.9   99.1   99.8
104    90.0  100.0   99.0   99.2   99.8
105    90.0  100.0   99.0   99.2   99.9
106    90.0  100.0   99.1   99.2   99.9
107    90.0  100.0   99.1   99.3   99.9
108    90.0  100.0   99.1   99.3   99.9
109    90.0  100.0   99.2   99.3   99.9
110    90.0  100.0   99.2   99.4   99.9
111    90.0  100.0   99.2   99.4   99.9
112    90.0  100.0   99.3   99.4   99.9
113    90.0  100.0   99.3   99.5   99.9
114    90.0  100.0   99.3   99.5   99.9
115    90.0  100.0   99.4   99.5   99.9
116    90.0  100.0   99.4   99.5   99.9
117    90.0  100.0   99.4   99.5  100.0
118    90.0  100.0   99.4   99.6  100.0
119    90.0  100.0   99.5   99.6  100.0
120    90.0  100.0   99.5   99.6  100.0
-------------------------------------------
``````

Table 1: Relationship between expected performance 'p' and grade difference 'd' as defined in GS (green line), CGS, AGS and AGS2 (blue line), Ã‰GS, Ã‰GS2, Ã‰GS3 and Ã‰GS4 (red line) and Ã‰GS5 and Ã‰GS6 (yellow line), and (normal relationship 'p = 100*(1 + Erf[d/50])/2', where the error function Erf[z] is the integral of the Gaussian distribution) as originally defined by Ã‰lo (brown line above yellow). Expected performance 'p' is a function of grade difference 'd', i.e., 'p = f(d)'. Note that both FIDE and USCF switched from normal (brown line) to logistic (yellow line) relationship 'p = f(d)' which they found provides a better fit for the actual results achieved.

Which 'p = f(d)'...

It is impossible to measure chess abilities independently of chess performances (there is not a device one can put on the heads of chess players and get a measure of their chess abilities), if that would be possible, one would be able to plot 'p' against 'd' and find the best fit for 'p = f(d)'. Nevertheless, assuming that for small differences in chess abilities (say 'd<=30') the relationship between chess performance and difference in chess abilities is linear, one can assume that grades for 'd<=30' are in fact chess abilities and, taking into account game records where 'd>30', plot 'p' against 'd' (black dots in the figure 3 below) and find that 'p = f(d)' for 'd>30' follows one of the sigmoid curves (yellow, brown and red lines in the figure 3) closer than linear approximations (green and blue lines in the figure 3). Figure 3: Mr Welch's finding. The '(d>30, q)' discrete experimental points match one of the sigmoid curves (yellow, brown and red lines) better than liner approximations (green and blue lines). Note that both FIDE and USCF switched from normal (brown line above red) to logistic (yellow line) relationship 'p = f(d)' which they found provides a better fit for the actual results achieved. Please note that the discrete points shown are for illustration purposes only, they are not a result of an actual analysis of the experimental data, and are shown to best fit the yellow line). (blue line: ECF linear with 50 point rule; green line: ECF linear with 40 point rule; brown line: Ã‰lo's normal, 'p = 100*(1 + Erf[d/g])/2', 'g = 50', where the error function Erf[z] is the integral of the Gaussian distribution; red line: Ã‰lo's logistic with 'g = 52.3975...', 'p = 100/(1 + 10^(-d/g))', 'g = (25*Log)/Log = 52.3975...'; yellow line: Ã‰lo's logistic with 'g = 50', 'p = 100/(1 + 10^(-d/g))', 'g = 50')

Variable factor 'k'... Figure 4: Factor 'ka' (used in AGS2, Ã‰GS2, Ã‰GS3 and Ã‰GS6) as a function of 'na' and 'nb' ('na' and 'nb' are number of games players A and B played in the last season). Factor 'ka' is used in formulae which correct player's A grade based on the difference in actual and expected performance against player B ('a2 = a + ka*(q - p)'). The idea is to make less trusted or established grades (based on frequency of play) change more rapidly. Note that if player's A opponent is ungraded (i.e., 'nb = 0') and player A in not ungraded ('na > 0' i.e., 'na >= 1') then 'ka = 0' and consequently player's A grade is not affected by games he or she played against player B (i.e., 'a2' remains unchanged, 'a2 = a + ka*(q - p) = a + 0*(q - p) = a'). For systems which do not stretch (nor shrink) the grades it always holds 'ka + kb = 1'.

ECF grade vs FIDE rating scale...

Elo (originally) suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has an expected score of approximately 0.75.

(In order to keep present ECF grade scale) one should suggest scaling grades so that a difference of 25 (not 200) grading points in chess would mean that the stronger player has an expected score of approximately 0.75 (i.e. 75%).

To me ECF grade scale makes more sense than FIDE rating scale, as for "small" grade differences (approximately '0 <= d <= 30') ECF grade difference is approximately the expected performance difference (in percents). Say, if a grade difference between two players is 10 grading points the stronger player is expected to score approximately 10% more, i.e., '50 + 10 = 60%', so '10' is approximately the expected performance difference (in percents). In the case of FIDE rating a player would have to be approximately 80 rating points stronger in order to score 60%, and '80' is approximately the expected performance difference (in percents) multiplied by 8 (why by 8, I do not know).

For "larger" grade differences (approximately 'd > 30') ECF grade difference is (or should be) larger than the expected performance difference (in percents), say if a grade difference between two players is 140 grading points, according to Ã‰GS6, the stronger player is expected to score approximately 49.84% more, i.e., '50 + 49.84 = 99.84%', so '140' is larger than the expected performance difference (in percents).

Rule approach...

Rule 1a: For a win you score your opponent's grade plus 50; for a draw, your opponent's grade; and for a loss, your opponent's grade minus 50. Note that, if your opponent's grade differs from yours by more than 40 points, it is taken to be exactly not 40 points above (or below) yours. At the end of the season an average of points-per-game is taken, and that is your new grade.

Rule 2b: For a win you score average grade plus 25; for a draw, average grade; and for a loss, average grade minus 25. Average grade is half of the sum of your and your opponent's grade. Note that, if your opponent's grade differs from yours by more than 40 points, it is taken to be exactly 40 points above (or below) yours. At the end of the season an average of points-per-game is taken, and that is your new grade.

Rule 1a...

In order not to stretch nor shrink (drift) the grades one should apply rule 1a only for one of the players in the game. Say, if payers A and B have played a game, when grading it, if one applies rule 1a for correcting the grade of player A one should not apply it for correcting the grade of player B (one should omit grading of player B in that game) and vice versa. The reason for that is that rule 1a sets a maximum possible correction for a player's grade and if one would apply it for correcting the grades of both players the grades would drift apart (as one would apply too much correction).

There is nothing intrinsically wrong in grading a game in a way to change one player's grade for a maximum amount and leave other player's grade unchanged. The problem is though that the number of games in which such a grade correction distribution applies is rather small, in most games either it does not apply or one does not know if it is applies.

Let us assume that there are two player pools, pool 1 with average grade of 130 and pool 2 with average grade of 160. Let us assume that player A graded 130 is a member of pool 1 and that player B graded 160 is a member of player pool 2. Let us assume that player A played all players in pool 2 and that player B played all players in pool 1 and let us assume that the game between player A and B ended in a draw. We are facing a problem of correcting the grades of players A and B for the game they drawn.

One can argue, as player A played a pool of players with average grade of 160 I will increase grade of player A for a maximum possible amount and leave the grade of player B unchanged, this is equivalent to saying that the players drew because player A improved (say he was lucky enough to be coached by Garry Kasparov on the summer break) and player B neither improved nor worsened. Fine, one applies rule 1a for the game grading only player A.

But one can also argue, as player B played a pool of players with average grade of 130 I will decrease grade of player B for a maximum possible amount and leave the grade of player A unchanged, this is equivalent to saying that the players drew because player B worsened (say he had fallen in love on the summer break and all he thinks about is his girlfriend) and player A neither improved nor worsened. One apply rule 1a for the game grading only player B.

Which argument is correct? Could be one or the other, could be neither, there are infinite possibilities, one could try to estimate if one of the players improved or worsened and for how much, but basically in order to make such an assessment one would need a thorough analysis of players' lives, maybe the game itself, etc., so the best guess would be to change both player's grades for half of the maximum amount, one increases the grade of player A and decreases the grade of player B for half of the maximum amount.

Every player eventually plays a pool of prayers with some average grade, this pool should have no bearing on a decision how to distribute grade corrections when grading individual games.

The main flaw in GS is that it applies rule 1a for changing the grades of both players in the game (should apply it to change the grade of one of the players only). This causes grade stretching (or grade drifting).

Rule 2b...

In order not to stretch nor shrink (drift) the grades one should apply rule 2a for both players in the game. The reason for that is that rule 2b sets half of the maximum possible correction for a player's grade and if one would apply it for correcting a grade of one of the players only the grades would drift towards each other (as one would apply too little correction).

So one has no problem in deciding how to distribute the grade correction in each individual game, by applying rule 2b, one increases the grade of one player and decreases the grade of other player for half of the maximum amount.

Replacing GS...

In my opinion three candidates for replacing GS are AGS3, Ã‰GS5 and Ã‰GS6, with Ã‰GS6 as the best and Ã‰GS5 as the second best.

AGS3 is GS with the 'k' factors equal to '1/2' (the ECF's linear approximation is used for 'p = f(d)'; does not stretch the grades).

Ã‰GS5 is sort of ECF equivalent of FIDE's Ã‰lo (logistic curve is used for 'p = f(d)', this is regarded as more accurate than the ECF's linear approximation; the grades are ECF grades, not FIDE ratings, i.e., a strong grandmaster is about 270 not 2800; grading is done every season rather than after every tournament; does not stretch the grades).

Ã‰GS6 has similar improvement (taken in a simple from) over Ã‰GS5 as Glicko has over FIDE's Ã‰lo which accounts for a grade trust (or establishment) based on frequency of play (i.e., less trusted or established grades change more rapidly than more trusted or established grades, consequently, in extreme case, ungraded players do not affect the grades of graded players; uses logistic curve for 'p = f(d)'; does not stretch the grades).

Code: Select all

``````--------------------------------------------------------------
grading   stretches  uses FIDE   changes less     preserves
system    grades     'p = f(d)'  trusted grades   total system
('k')      (yellow)    more rapidly     grade
--------------------------------------------------------------
GS        yes        no          no               yes
AGS3      no         no          no               yes
Ã‰GS5      no         yes         no               yes
Ã‰GS6      no         yes         yes              no
--------------------------------------------------------------
``````
Table 2: Main differences between GS (current Grading System), AGS3 (Amended Grading System three), Ã‰GS5 (Ã‰lo Grading System five) and Ã‰GS6 (Ã‰lo Grading System six).

"Junior problem"...

The so called "junior problem", or in general a problem of players whose chess abilites change rapidly (which has been addressed in Glicko 2) has not been addressed in any of the mentioned systems (so neither in Ã‰GS6 nor Ã‰GS5), though (in some posts in the thread) it has been hinted on how the problem might be approached.

One of the simple approaches to the "junior problem" could be that after calculating grades (in a normal way) using GS, AGS3, Ã‰GS5 or Ã‰GS6 (I am advocating using Ã‰GS6), one calculates average between the old and a new (just calculated) grades, then repeat the calculation (in a normal way) using GS, AGS3, Ã‰GS5 or Ã‰GS6, but this time with the average grades. This should address (to some extent) the problem of players whose chess abilities change rapidly (say fast improving juniors). This idea still needs to be checked (this approach may well cause grade stretching or shirking).

Another idea for resolving the "junior problem" could be to make 'k' factors for juniors (i.e., players whose chess abilities change rapidly) higher than for other players, while keeping the sum of the two factors 'ka' and 'kb' at 1 (which would guarantee that the grades won't be stretched nor shrank). The measure of change of one's chess ability could be the grade change in the last two seasons (chess abilities of those players who played in less than two seasons so far may be assumed to change rapidly). The idea is to trust the grade of a rapidly improving juniors (or other players whose chess abilities change rapidly) less than that of ordinary adult players (or any other players) whose grade is more or less constant. With this approach junior grades (or grades of players whose chess abilities change rapidly) would change faster affecting the grades of other players they have played less. The problem would remain to decide how much to correct 'k' factor for change in chess abilities and how much for grade trust (establishment) based on frequency of play.

Estimated effect on stretching...

Using a system where the sum of the factors 'k' is always equal to 1 (i.e. addressing the grade stretching problem) would affect the grades significantly in a longer run as in GS the grade stretching happens all the time (the effect increases with performance difference) and accumulates with time. Ã‰lo's logistic curve (present in both Ã‰GS5 and Ã‰GS6) wouldn't affect the grades significantly as its effect is relatively small for relatively small grade differences (the effect increases with grade difference), but it may affect the grades noticeably in cases where the grade difference is big. Professor Glickman's idea about grade establishment wouldn't affect the grades significantly in general as its effect is in general relatively small, but it may affect the grades significantly in cases where ungraded (or less active) players play graded (or more active) players.

The formulae...

Let 'a' and 'b' are the grades of players 'A' and 'B', 'p' expected performance of player 'A' (expected performance of player 'B' is then '100 - p'), 'q' actual performance of player 'A' (actual performance of player 'B' is then '100 - q') and 'd = a - b' the grade difference, 'na' and 'nb' the number of games players 'A' and 'B' played in the last season for which grades 'a' and 'b' were calculated. Then, new grades of players 'A' and 'B', 'a2' and 'b2', are calculated as follows:

GS (current Grading System) formulae:

Code: Select all

``````(* GS *)
ClearAll[a, b, a2, b2, d, g, s, ka, kb, p, q];
a = 120; b = 120;
q = 50;
g = 50; s = 40;
d = a - b;
ka = 1; kb = 1;
If[d >= 0, If[d > s, p = 90, p = g*(1 + d/g)],
If[d < -s, p = 10, p = g*(1 + d/g)]];
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]
``````
AGS3 (Amended Grading System three) formulae:

Code: Select all

``````(* AGS3 *)
ClearAll[a, b, a2, b2, d, g, s, ka, kb, p, q];
a = 120; b = 120;
q = 50;
g = 50; s = 40;
d = a - b;
ka = 1/2; kb = 1/2;
If[d >= 0, If[d > s, p = 90, p = g*(1 + d/g)],
If[d < -s, p = 10, p = g*(1 + d/g)]];
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]
``````
Ã‰GS5 (Ã‰lo Grading System five) formulae:

Code: Select all

``````(* EGS5 *)
ClearAll[a, b, a2, b2, d, g, ka, kb, p, q];
a = 120; b = 120;
q = 50;
d = a - b;
g = 50;
ka = 1/2; kb = 1/2;
p = 100/(1 + 10^(-d/g));
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]
``````
Ã‰GS6 (Ã‰lo Grading System six) formulae:

Code: Select all

``````(* EGS6 *)
ClearAll[a, b, a2, b2, d, g, ka, kb, p, q, na, nb];
na = 30; nb = 30;
a = 120; b = 120;
q = 50;
d = a - b;
g = 50;
ka = If[na + nb > 0, nb/(na + nb), 1/2]; kb =
If[na + nb > 0, na/(na + nb), 1/2];
p = 100/(1 + 10^(-d/g));
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]
``````
(if players 'A' and 'B' had played only one game in the season 'q' is either 100, 0 or 50, if they played more than one game it can be a number between 0 and 100 inclusively, 'Log[z]' gives the natural logarithm of 'z' (logarithm to base 'e'), 'x^y' gives 'x' to the power 'y', input parameters for GS, AGS3 and Ã‰GS5 are: 'a', 'b' and 'q', input parameters for Ã‰GS6 are: 'a', 'b', 'na', 'nb' and 'q', output parameters are: 'a2' and 'b2')

Note: The formulae are used to calculate a new grade of player 'A' for every opponent 'B' he or she played in the season. At the end of the season an average of the calculated grades (for every opponent 'B') is taken, and this average is a new player's 'A' grade for the season (for GS, AGS3 and Ã‰GS5 if a player has not played enough games in the season, games from previous season or seasons will be taken into calculation, for Ã‰GS6 no games from previous season or seasons need to be taken into account).

Ungraded players...

Rule 1a: For a win you score your opponent's grade plus 50; for a draw, your opponent's grade; and for a loss, your opponent's grade minus 50. Note that, if your opponent's grade differs from yours by more than 40 points, it is taken to be exactly not 40 points above (or below) yours. At the end of the season an average of points-per-game is taken, and that is your new grade.

Let 'a' and 'b' are the grades of players 'A' and 'B', 'p' expected performance of player 'A' (expected performance of player 'B' is then '100 - p'), 'q' actual performance of player 'A' (actual performance of player 'B' is then '100 - q') and 'a2' and 'b2' new grades of players 'A' and 'B'.

'a2' and 'b2' are calculated using the following formulae (holds for any grading system mentioned here, including the current one):

Code: Select all

``````a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
``````
Rule 1a is equivalent to:

Code: Select all

``````g = 50; s = 40;
ka = 1; kb = 1;
d = a - b;
If[d >= 0, If[d > s, p = 90, p = g*(1 + d/g)],
If[d < -s, p = 10, p = g*(1 + d/g)]];
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
``````
where player A is you and player B is your opponent.

'a2' can be expressed in terms of 'b', if one substitutes 'a = b + d', one gets 'a2 = a + ka*(q - p) = b + d + ka*(-50 - d + q) = -50 + b + q'.

In general 'a2' is a function of 'b', 'q' and 'd', say AGS3's 'a2' is equal to 'a2 = (-50 + 2*b + d + q)/2', but GS's 'a2' is only a function of 'b' and 'q', which means that one needs to know only your opponent's grade 'b' and your actual performance 'q' in order to calculate your new grade 'a2'.

This happy coincidence can be utilized in calculation of grades of ungraded players. We could apply rule 1a (which can be used if one changes the grade of one of the players in the game only) to all games where ungraded players play graded players using it for grade calculation of ungraded players only (note that the grades of ungraded player need not to be estimated, as they are not needed in the calculation). Applying rule 1a for the ungraded player but omitting grading of the game for the graded player is probably the best one can do (this is one of the rare occasions where one knows that one wants to credit or penalize only one player in the game for a fully allowed amount).
Last edited by Robert Jurjevic on Thu Jan 20, 2011 11:25 pm, edited 115 times in total.
Robert Jurjevic
Vafra

Roger de Coverly
Posts: 19453
Joined: Tue Apr 15, 2008 2:51 pm

### Re: GRADING ANOMALIES

Let us assume that two ungraded players both with estimated grade of 100
Where do you get the estimated grade from? The ECF system now works backwards from results to produce its initial estimate.
that one of the players scores 80%.
All the ECF system can infer is that the two players are 30 points apart. They need to have played some players with established grades to find out whether they should be 100 and 130 or 150 and 180 or 200 and 230

Many, many years ago I believe that graders would assign all new players a grade of 100 unless there was evidence they were stronger than that. This gave "support" to the bottom end of the grading system. This also meant that players who were any good could get into the 120's and above without much delay. Thus who were well below the 100 standard would drop of course but many retired from active play before their overratedness became too apparent. In those days, there was no attempt to rate under 10 tournaments so the minimum standard for rating was somewhat higher. Sometimes even the bottom division of a league wouldn't be rated.

My point is that "graders' estimates" disappeared from the system many years ago so you wouldn't get this 70/130 effect. I don't think you would have got it even in the past because graders would have given a higher starting estimate to a player scoring 80%.

Robert Jurjevic
Posts: 207
Joined: Wed May 16, 2007 1:31 pm
Location: Surrey

### Re: GRADING ANOMALIES

All of my arguments would hold if the grades were established (I thought that the difference between expected and actual performance in the examples would be easier accepted if it was assumed that the grades were estimated).
Robert Jurjevic
Vafra

Brian Valentine
Posts: 491
Joined: Fri Apr 03, 2009 1:30 pm

### Re: GRADING ANOMALIES

Ben Purton wrote:Brian your surname is pretty cool, id name my kid Vicent if I had that surname

Vincent Valentine, if he could'nt get the ladies no one could...
We called our kid Graham and he does well enough to have no time for chess Robert Jurjevic
Posts: 207
Joined: Wed May 16, 2007 1:31 pm
Location: Surrey

### Re: GRADING ANOMALIES

ECF_grades computer program updated...

Please note that ECF_grades computer program (command line executable for Windows and C source file, together with a sample input file, which is my games for this season up to date) can be found at the following Internet link (version 3.0 which now includes GS, AGS3, Ã‰GS5 and Ã‰GS6 calculation, CGS, AGS, Ã‰GS and Ã‰GS2 removed, AGS3, Ã‰GS5 and Ã‰GS6 added)...

http://www.jurjevic.org.uk/chess/grade/

The program can calculate one's new grade (or performance in an event) based on the games supplied in a pgn file (moves are not required, but grades, results and for Ã‰GS6 number of games played in the previous season are) using GS, AGS3, Ã‰GS5 or Ã‰GS6 formulae.

My current grade for 2008/09 season is as follows (new ECF grade for chess player Jurjevic based on 21 games in all0809.pgn file with score of 5-6-10 47.6% and average opponent's grade of 97):

Code: Select all

``````----------------------
GS  AGS3  Ã‰GS5  Ã‰GS6
----------------------
93    91    92    92
----------------------
``````
Robert Jurjevic
Vafra