GRADING ANOMALIES

Robert Jurjevic · Post by **Robert Jurjevic** » Wed Oct 22, 2008 3:48 pm

Proposal...

Maybe (in the light of introduction of the new corrected grades) now is the right time to ask the ECF officials if they would consider switching to Ã‰lo Grading System two (Ã‰GS2) rather than continue using the current Grading System (GS)?

The main advantages (as I see it) of Ã‰GS2 over GS are as follows:

1. The relationship between expected performance 'p' and grade difference 'd' is the logistic curve (currently adopted by FIDE Ã‰lo Rating System) which is in accord with Mr David Welch's finding that chess performance as a function of difference in chess abilities matches the logistic curve (red line in the figures 1 and 2 below) closer than the curve adopted by GS (green line in the figures 1 and 2 below).

2. Grades of less active chess players change more rapidly (say the estimated grade of a junior who has just entered the system would change more rapidly than the grades of well established players he played against, or say you had been inactive for a period of time and you have re-entered the system with an old grade, your grade would change more rapidly than the grades of well established players you played against, etc., this is in fact a simplified improvement of Glicko 1 over FIDE Ã‰lo).

Note: Ã‰GS2 grades are on ECF (rather than FIDE) scale, i.e. a strong GM is around 270, not around 2800, etc.

Logistic curve...

Figure 1: Relationship between expected performance 'p' and grade difference 'd' as defined in GS (green line), CGS and AGS (blue line) and Ã‰GS and Ã‰GS2 (red line). Expected performance 'p' is a function of grading difference 'd', i.e. 'p = f(d)'.

Figure 2: Mr Welch's finding. The '(d>30, q)' discrete experimental points match Ã‰GS2's 'p = f(d)' closely (please note that the discrete points shown are only for illustration purposes, they are not a result of an actual analysis of the experimental data).

Note: It is impossible to measure chess abilities independently of chess performances (there is not a device one can put on the heads of chess players and get a measure of their chess abilities), but assuming that for small differences in chess abilities ('d<=30') the relationship between chess performance and difference in chess abilities is linear, one can find (using experimental data) that relationship between chess performance and difference in chess abilities matches the logistic curve closely (you assume that grades for 'd<=30' are in fact chess abilities, then you plot discrete experimental points '(d>30, q)' to find that they match Ã‰GS2's 'p = f(d)' closely).

The formulae...

Let 'a' and 'b' are the grades of players 'A' and 'B', 'p' expected performance of player 'A' (expected performance of player 'B' is then '100 - p'), 'q' actual performance of player 'A' (actual performance of player 'B' is then '100 - q') and 'd = a - b' the grade difference, 'na' and 'nb' the number of games players 'A' and 'B' played in the season for which grades 'a' and 'b' were calculated. Then, new grades of players 'A' and 'B', 'a2' and 'b2', are calculated as follows:

GS (current Grading System) formulae:

Code: Select all

(* GS *)
ClearAll[a, b, a2, b2, d, g, s, ka, kb, p, q, na, nb];
a = 120; b = 120;
q = 0;
g = 50; s = 40;
d = a - b;
ka = 1; kb = 1;
If[d >= 0, If[d > s, p = 90, p = g*(1 + d/g)], 
    If[d < -s, p = 10, p = g*(1 + d/g)]];
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]

Ã‰GS2 (Ã‰lo Grading System two) formulae:

Code: Select all

(* EGS2 *)
ClearAll[a, b, a2, b2, d, g, ka, kb, p, q, na, nb];
na = 40; nb = 40;
a = 120; b = 120;
q = 0;
d = a - b;
g = (25*Log[10])/Log[3];
ka = If[na + nb > 0, nb/(na + nb), 1/2]; kb = 
  If[na + nb > 0, na/(na + nb), 1/2];
p = 100/(1 + 10^(-d/g));
a2 = a + ka*(q - p);
b2 = b + kb*((100 - q) - (100 - p));
Round[N[a2]]
Print[];
Round[N[b2]]

(if players 'A' and 'B' had played only one game in the season 'q' is either 100, 0 or 50, if they played more than one game it can be a number between 0 and 100 inclusively, 'Log[z]' gives the natural logarithm of 'z' (logarithm to base 'e'), 'x^y' gives 'x' to the power 'y', input parameters for GS are: 'a', 'b' and 'q', input parameters for Ã‰GS2 are: 'a', 'b', 'na', 'nb' and 'q', output parameters are: 'a2' and 'b2')

Note: The formulae are used to calculate a new grade of player 'A' for every opponent 'B' he or she played in the season. At the end of the season an average of the calculated grades (for every opponent 'B') is taken, and this average is a new player's 'A' grade for the season (for GS, if a player has not played enough games in the season, games from previous season or seasons will be taken into calculation, for Ã‰GS2 no games from previous season or seasons would be taken into account).

Ian Thompson · Post by **Ian Thompson** » Wed Oct 22, 2008 6:11 pm

Ian Thompson wrote:
Chris Majer wrote:A copy of the report to Council on Grading anomalies can be found on the grading section of the ECF website.

There will be an open meeting of the ECF at which the grading team will explain the grading anomalies and the proposed solution. The team will be happy to take questions either at the meeting or in advance. For details again see the ECF website.
My questions are:

1. When the new grades come into effect in a year's time, will the historical calculation of new grades (which goes back 3 or 4 years, I think) also be published, to replace the old grades (e.g. in the online grading database on the ECF website).

2. If the answer to 1. is No, what grades will be in the grading list issued to graders. Will this be 3 years of new grades or a mixture of old and new grades?

3. Will the ECF issue any guidance on how to make a reasonable conversion from old grades to new grades for years prior to those for which new grades have been published. I am thinking of the situation where someone hasn't played for, say, 5 years. They then apply to play in a grading limited event. The organiser then has to use an old grade to decide whether the player is eligible for his event or not.

Anyone know if my questions above were either asked or answered last weekend? Are we going to see a summary of all questions asked, and their answers?

Howard Grist · Post by **Howard Grist** » Wed Oct 22, 2008 11:16 pm

Ian Thompson wrote:1. When the new grades come into effect in a year's time, will the historical calculation of new grades (which goes back 3 or 4 years, I think) also be published, to replace the old grades (e.g. in the online grading database on the ECF website).

There are no plans to do this. Only official grades listed will be listed on the website, which means old grades up to January 2009 and new grades from July 2009 onwards.

Ian Thompson wrote:2. If the answer to 1. is No, what grades will be in the grading list issued to graders. Will this be 3 years of new grades or a mixture of old and new grades?

The grading list issued to graders will contain each player's current grades, their previous grade (January 2009 for rapid, July 2008 for standard) and points/games totals for the previous three years. All figures given will be in 'new' currency.

Ian Thompson wrote:3. Will the ECF issue any guidance on how to make a reasonable conversion from old grades to new grades for years prior to those for which new grades have been published. I am thinking of the situation where someone hasn't played for, say, 5 years. They then apply to play in a grading limited event. The organiser then has to use an old grade to decide whether the player is eligible for his event or not.

The best you can do here is use the appropriate conversion 'formula' from the grading help page http://grading.bcfservices.org.uk/newgrades.php

Ian Thompson wrote:Are we going to see a summary of all questions asked, and their answers?

The only place that you're likely to see a report of the meeting is http://www.sccu.ndo.co.uk/bcf.htm under item (5) of the ECF Council Meeting.

Carl Hibbard · Post by **Carl Hibbard** » Wed Oct 22, 2008 11:23 pm

Hmmm, publication of the new list depends on "if" people are still even talking to me at that point

Robert Jurjevic · Post by **Robert Jurjevic** » Wed Dec 17, 2008 6:53 pm

Improved Ã‰GS2...

In order to compensate for anomalies caused by players whose chess abilities change rapidly (which in practice usually results in grade deflation caused by rapidly improving juniors) the following rule can be added to the Ã‰GS2 calculation (http://www.ecforum.org.uk/viewtopic.php ... &start=210):

3. After the Ã‰GS2 grades are calculated an average (arithmetic mean) between current (used in calculation) and newly calculated grade is taken for each player and the Ã‰GS2 calculation is repeated using the averaged grades (this is in fact a simplified improvement of Glicko 2 over FIDE Ã‰lo).

Example: Say, if a junior's current grade was 90 and a new Ã‰GS2 calculated grade 120 his or her averaged grade (to be used in recalculation) would be (90 + 120) / 2 = 105. The same holds if a player's current grade was 120 and a new Ã‰GS2 calculated grade 90.

Note: It would appear that a wish of most chess players is to improve their chess playing skills, they learn, they practice, they study, etc. Therefore, it is likely that if a player has improved during the course of the season that his or her current grade would be lower than what it should have been. There will always be a number of players in the system (this is usually a small percentage) who will improve rapidly (usually promising juniors or in fact any juniors with interest in chess) and it would appear that any system is bound to grade deflation, but the above advised rule 3 should dampen the effect. Please note that the above rule 3 could be applied to the current Grading System (GS) too (in case ECF officials are reluctant to switch to Ã‰GS2 but like the idea).

Note: It shall also be important to estimate grades of players entering the system as accurately as possible (though even if this is not done so well the Ã‰GS2 would account for it, as the grades of newly entered players would change more rapidly affecting the grades of established players less).

Brian Valentine · Post by **Brian Valentine** » Sun Apr 05, 2009 8:21 am

I realise this contribution is far too late to influence the latest review of grades, but I would like to lay down a marker for the next investigation into the â€œgrading stretchâ€ issue.

In announcing the grading meeting for last October Dave Thomas stated: â€˜Players who outgraded their opponents by 10 points were scoring approximately 58% rather than the expected 60%. The system had become "stretched"â€™.

I want to challenge this concept being unquestionably something that must be fixed next time. The concept has a thread throughout the forum discussion on the grading review, often with different statistics demonstrating the same point.

Let us run a thought experiment. Consider an all play all tournament, where each player has an unchanging identical grade. Due to random blunders there is a dispersion of points in the final table and there is therefore a ranking from winner to last. In any functioning grading system the winners grade will go up and last placedâ€™s down. However, if the tournament is rerun again, since each player has identical strength, one would expect the winnerâ€™s grade to fall and tailenderâ€™s to rise this time.

This feature is in all grading systems. For instance it appears in Glickman and Jonesâ€™ review of the USCF list in 1997-8 and the latest review of the Scottish rating list by Oglesby.

I understand that the new grades have been fixed to eliminate this observed stretch. Hence I predict that the â€œstretchâ€ feature will reappear very quickly.

My calculations suggest that if the system is functioning perfectly that a 10 point grading difference should lead to the superior player scoring about 59.5%.

My sums seem to be suggesting that â€œstretchâ€ is a useful index of how well the system is working. It will increase where other impurities, as identified in this forum, are in the system. My view is moving towards â€œstretchâ€ being, at least in part, a symptom not the disease itself.

Matthew Turner · Post by **Matthew Turner** » Sun Apr 05, 2009 8:42 am

Brian,
An excellent contribution and you are absolutely right. If players are graded at

100, 110, 120, 130, 140 and 150

but they 'should' be graded at

100, 108, 116, 124, 132 and 140

it makes absolutely no difference and the system would tend towards correcting 'errors'

Sean Hewitt · Post by **Sean Hewitt** » Sun Apr 05, 2009 11:01 am

Brian Valentine wrote:My calculations suggest that if the system is functioning perfectly that a 10 point grading difference should lead to the superior player scoring about 59.5%.

It might be interesting if you posted your calculations.

Brian Valentine · Post by **Brian Valentine** » Sun Apr 05, 2009 12:53 pm

This note should be considered as the first step in a line of enquiry. I believe, but cannot prove yet, that many of the grading impurities discussed in this forum will explain more of the observed stretch. I aim to set out that any the variance in the system will lead to â€œstretchâ€ and generally further statistical error will increase that variance.

In any system the deviation of the observed grade from the true grade introduces a bias that appears to make the people with extreme observed grades perform nearer the average.

Here is a crack at forecast of this inherent bias, but it does get a bit technical â€“ some statistical knowledge is required to follow the remainder of this note. I apologise for some of the loose notation. In particular I wrotein up with sigma and integral signs not realising the forum does not recognise greek text, I have tried to edit the post using Sum and Int, but might have missed either an S or funny o which are there respective replacements. There is also a confusion over the concept of " x given y" where the usual symbol has been replaced by small 1/2. I've retainedt his as the least confusing solution (sorry Sean).

If (and only if) the system is functioning properly then each player has a â€œtrue gradeâ€ G(i) which is assumed constant over the grading period. For this argument I ignore inflation/deflation, the junior problem, the colour problem etc. G(i) is unobservable and this restricts tests on the system.

What we do have is each playerâ€™s performance rating. Call this P(i). I will assume that rating and grading are calibrated identically using the ecf system, since that is the one under discussion here.

Ignoring the 40 points restriction each game is rated as:

p(i) = P*(j) + 100*(r(i,j)-.5) (A

where P*(j) is his opponentâ€™s rating last period and r(i,j) is the result of the game from iâ€™s perspective.

And P(i) = Sum p(i)/n(i) , where n(i) is the number of games rated in the grading period.

There are a number of features that exist if the system is working:

E[ r(i,j)] = 0.5+ .01[G(i)-G(j)] (B

and applying expectations to all rated games using (A)

n(i)* E[ P(i)] = Sum P*(j) + 100*Sum[E(r(i,j)-.5]

which can be rewritten

n(i)* E[ P(i)] = Sum P*(j) + 100*Sum .01*[G(i)-G(j)]

Hence if the rating system is functioning correctly one gets:

n(i) E[P(i)] = Sum P*(j) + n(i)*G(i) -SumG(j)

and, I think â€œfunctioning correctlyâ€, (given everything ignored), should mean that one assumes:

E [Sum P*(j)] =Sum G(j)

and hence get the desirable feature:

E[P(i)] = G(i)

However P(i) will be more dispersed than G(i) because of:
1. the sampling â€œerrorâ€ from the results and therefore
2. the sampling errors in S P*(j)
3. Other imprecision ignored in this note.

We do have a distribution of P(i). I have downloaded grades2008v5 from the ecf website. Using the (old & long play) grade for all graded players (10,257 in number), the mean grade is 111 and the standard deviation about 46, with a bell shape graph which looks normally distributed (but not statistically tested as such).

I have carried out some investigations on the distribution of pÂ½g and estimate the variance as 43.6. My workings for this estimate are in the appendix.

I now assume that:
1. P(i) is distributed N( 111, 1229.4) over all players,
2. Hence one can deduce G(i) is distributed N( 111, 1229.4-43.6=1185.8), and
3. The distributions are independent.

Given stretch should be expected in the system these assumptions give an estimate of the stretch.

using equation (B):

E[ r(i,j)] = 0.5+ .01[G(i)-G(j)]

gives

E[ r(i,j)Â½p(i,j) ] = 0.5+

.01*{double integral} Pr(G(i)Â½P(i))* Pr(g(j)Â½p(j))*[G(i)-G(j)].dg(i).dg(j)

which becomes

0.5+ .01*[E[G(i)Â½p(i)]- E[G(j)Â½p(j)]]

This can be shown to be (I dusted off my old statistics text book on Bayes Theorem):

0.5+ .01* 1185.8/1229.4*[P(i)-P(j)]

Plug in P(i)-P(j)= 10 gives 59.5.

APPENDIX

Variance of p(i)

p(i) = P*(j) + 100*(r(i,j)-.5)

The mean of r(i,j) comes from

E[ r(i,j)] = 0.5+ .01[G(i)-G(j)]

The variance is more difficult. On my big database â€“ a megabase from chessbase maintained by adding TWIC gives a draw quotient of 32%. The 4NCL database shows 37% of draws.

I shall assume 1/3 of games are drawn for illustration. As I have no easy way of analysing further I shall assume all variation in expected result comes from increasing wins with draws staying a constant proportion. I know this is unrealistic, but it gives a first shot.

Call r(i,j)-.5 = s and [G(i)-G(j)]=g, then

E(s)= .01g

P(win) = .01g+ 1/3
P(loss)= 1/3-.01g

E(s2) = (.01g + 1/3)*.5*.5+ (1/3-.01g)*(-.5*-.5)

= 1/6

Since this is close to the variance if P(draws)=0 it looks as though the assumption is insensitive.

Var(s) = E(s2) â€“ {E(s)]2 = 1/6 â€“ (.01g)2

Var[p(i)] = 1002*[1/6 â€“ (.01g)2]

= 1667-g2

If n is 30 and g is 20 (the latter approximated from sd of my recent graded games) then

Var[p(i)] Â» 1267

Var[P(i)]Â» 1267/30 = 42.2

and given this variance the variance in P*(j) is 42.2/30 the combined variance is 43.6.

Hence the variance of G can be estimated by:

V(P) = V(G)+ 43.6

V(P) = 1229.4 so V(G)=1185.8

E Michael White · Post by **E Michael White** » Mon Apr 06, 2009 11:19 am

Watch this space it might get interesting if you are interested in this type of thing.

Brian Valentine · Post by **Brian Valentine** » Fri Apr 10, 2009 1:53 pm

This debate has often referred to reasons why stretch might have occurred. Given I believe that stretch of some sort will always exist, is there any evidence that shows such stretch has been different in earlier times?

It is clear it has existed in 2000+, but is there any firm evidence that there has actually been a "drift" (the term used by Dave Thomas given in the council minutes)?

Such evidence would be useful to identify the source of this problem.

Roger de Coverly · Post by **Roger de Coverly** » Sat Apr 11, 2009 10:24 pm

Such evidence would be useful to identify the source of this problem.

There doesn't seem to have been much effort to look at past discussions on grading.

For example about 20 years there was a perception based on analysis of the mean and median over consecutive seasons that grades were drifting upwards. There's material about this in the SCCU bulletin ( and also in Newsflash) with some references in the BCM. The mechanism of correcting this inflation was to trim back the junior bonus ( 10 for all ages at the time).

When the statement is made that players 10 points apart don't score 60% do we know exactly what it is they have measured? For example if you wanted to test a British Championship from long long ago, what would you do? If they are just interpreting the results of their recursion routine, then surely great care should be taken with such assertions, first because we don't know how trustworthy (theoretically sound) the recursion really is and second because is it valid to interpret differences between the last season's grade and the recursion performance in this manner ?

The recent changes to the K factor in the Fide list reminds me of a point made by EM White, namely that the ECF system has its own equivalent of a K factor in the 30 game rule. Thus if you have a 10 games a season 150 player, who plays a season at 175, then his new grade is 158 and he won't catch up to his new 175 standard until 30 games have been played. So a calculation which compared this season's performance to last season's grade could give misleading results.

We are of course still waiting for an explanation or statement on the "Rough" issue. If a perturbation of setting a junior flag incorrectly can cause a "new" grade to move by 43 points then this casts serious doubts on either the theory or the implementation of the project.

It seems to me that a grading system is not just a predictor of results, it's also supposed to be self correcting. That is if the estimate (grade) of a player's strength is incorrect, then the system should contain measures to self correct.

Keith Arkell · Post by **Keith Arkell** » Fri Apr 17, 2009 3:54 am

Hello good people of the ''grading'' forum.

I was just a bit curious about something:

Ok I have to admit that I(along with most players at the upper end of the grading spectrum) don't really give 2 hoots what my ECF grade is,but something puzzles me. I see that my grade apparently is ''232'' and underneath this in red letters is my ''new grade'' of ''229'' based on ''34 games''.

Can somebody please tell me what I am missing. I mean what is this silly red figure about? According to a very brief approximation,since June 2008 I have played about 110 graded games in the Uk,and it looks to me like I have performed at about 244. I have played in quite a few very strong events,and in total have lost 6 games(all to strong Gms),drawn about 27 and won the rest (about 77) including loads of club games for Widnes,Heywood and 6 games in the 4NCL.

Carl Hibbard · Post by **Carl Hibbard** » Fri Apr 17, 2009 6:59 am

Keith Arkell wrote:Hello good people of the ''grading'' forum.

Hmm, to be honest I have been expecting a new official ECF forum on the subject of grading or the CFS project although it hasn't happened yet!

Keith Arkell wrote:Can somebody please tell me what I am missing. I mean what is this silly red figure about? According to a very brief approximation,since June 2008 I have played about 110 graded games in the Uk,and it looks to me like I have performed at about 244. I have played in quite a few very strong events,and in total have lost 6 games(all to strong Gms),drawn about 27 and won the rest (about 77) including loads of club games for Widnes,Heywood and 6 games in the 4NCL.

It's your new and rather "controversial" grade, see:-

http://grading.bcfservices.org.uk/newgrades.php

Neill Cooper · Post by **Neill Cooper** » Fri Apr 17, 2009 7:43 am

Keith Arkell wrote: Ok I have to admit that I(along with most players at the upper end of the grading spectrum) don't really give 2 hoots what my ECF grade is,but something puzzles me. I see that my grade apparently is ''232'' and underneath this in red letters is my ''new grade'' of ''229'' based on ''34 games''.

It is a new calculation of the 232 grade, not an estimate of your 2008/9 grade.

English Chess Forum

GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES

Re: GRADING ANOMALIES