Brian Valentine wrote:Sean,
Thisis interesting. Are you suggesting the work you did is indeed the mysterious " mathematical Modelling" referred to in my quote?
The modelling, in my reading, implies that explains the results you demonstrated. I think Roger and I are concerned that this stretch may have always been there (more or less). The "mathematical modelling" quote implies there is a theory behind the effect increasing. I have trouble replicating this effect.
Hi Brian,
No my work is not the mathematical modelling referred to previously. That was done by the grading team after I had identified the problem. I was not involved (primarily because the grading team did not want to believe my findings).
Essentially what happened was that I was approached in 2006 to see if I could do some work on the grading system as some believed that they had observed the grading system had "gone wrong" and was no longer working as it should. I asked for and received all of the individual result data from the previous season.
I then extracted games where the following applied:-
1. The player had a published SP grade in 2005
2. Only SP games played between 1/6/05 and 31/5/06 against players in the sample group would count
3. The player had a minimum of 5 games in the sample
4. The players games yielded a score between 15% and 85%
5. Steps 4 and 5 were repeated until all criteria were met
The sample produced 6302 players who played 61375 games between them.
I then used an iterative process to work out grades for these players based only on games within the sample group. The calculation process followed the ECF system with 2 exceptions. Firstly, juniors did not receive a junior supplement. Secondly, the 40 point rule was not applied )as it is not appropriate for this kind of iteration).
The results were stark. There was stretching or deflation which could be measured by the approximation Perf = Grade * 0.7752 + 48.7816. Secondly, when looking at the number of games played, it didn't matter whether the player played 5 ; 10 ; 15 ; 20 ; 25 or 30 games the stretch was the same for all cases.
I then noticed (purely by accident really) that if you applied the above formula to all grades uniformly and then looked at FIDE ratings, the old formula of ECF * 8 + 600 = FIDE worked well for players graded above 130 although it probably needed to be + 700 which at the time I took to be inflation in the FIDE rating system (also bear in mind that my grading solution produced new grades that were lower than the ECF's new grades).
I identified two causes to this stretch. Firstly ungraded players, whose performance in a small number of games is not considered to be statistically reliable enough to warrant publishing a grade, yet whose unreliable grading performance is used to calculate the grades of others! I therefore suggested that games against ungraded opponents should not be graded. To which one member of the graded team replied "We can't do that. How would we charge game fee?!"
The second cause was juniors. Of 927 juniors with a standard play grade in the 2006 list, 265 increased their grade (compared to 2005) by more than 10 points and 100 increased their grade by 20 points or more. These juniors tended to bemore active than average, multiplying the effect of their under-gradedness. I therefore looked at the junior supplement for each age and found it to be totally inadequate. Indeed, I could find no correlation between age and rate of improvement at all (as others far more well versed in junior chess have subsequently observed). I hypothesised at the time that "This could be due to rising numbers of junior only tournaments where the output is not verified by the participation of established, statically graded players. It seems to me that junior players are likely to be having a serious deflationary effect upon the list as a whole and the best solution appears to be to treat them as new players each year, as we are doing now with players with negative grades." I must admit that I had forgotten that I said this!!
I did understand at the time that a small minority of players like to calculate their grades on an ongoing basis. I therefore suggested, to minimise the disruption to them, that the above only applies to improving juniors and suggested (arbitrarily) that only juniors increasing in strength by 10 points have this treatment applied to them.
As I say, the grading team simply did not want to believe that there was any basis in what I said and published a statement on the ECF website to that effect. I understand that they went off modelling themselves for 18 months to disprove what I had said. I obviously dont know what they did as I wasn't involved at all but I believe they used 5 years data (before then individual results were not reported) to try to disprove what I had found but find instead their results mirrored my own.
Hope that history helps.
Roger de Coverly wrote:Sean Hewitt wrote:I took all results for a season and worked out the a ranking list for all players irrespective of ECF grade based purely on those results and the ECF grading formula. I also established how many points below the top players performance each and every other players performance was.
That's surely an equivalent process to that used for new player estimation. As Brian suggests there are a number of possible methods of processing a season of results which should converge to the same end list. At least one concern of just using one year's data is that it leaves out the lag /stabilisation effect of the inclusion of past year's results in the published list. Objectively it's also a list that you should compare not just to the begin season grades, but also to the end season grades.
Its equivalent in that they should produce the same result if the same methodology is used. But I excluded high and low scoring players from my iteration (because it was obvious that they would infect the results) whereas it seems the ECF did not. And I still dont know precisely how they calculate initial estimates for ungraded players!
Roger de Coverly wrote:So you have (begin season list) which has (actual results) applied to it to produce (end season list). The ranking approach is taking function(actual results). Obviously you can compare the output of this function to both (begin season list) and (end season list) but you wouldn't expect them to be identical.
I agree that you wouldn't expect the lists to look the same. But more importantly, you shouldn't expect in your wildest dreams that the vast majority in the list would decline in grade, and that was what was actually observed.
Roger de Coverly wrote:
The difference to the published list would also contain lag and player inconsistency effects. As far as I recall you got a tighter range between top, middle and bottom than the published list. That's also indicative of non-linearity - 140 scores 60 % against 130 who scores 60% against 120 who scores 60% against 110, but 140 doesn't score 80% against 110. An effect seen in both the Grist data and the Scottish data.
Not something I looked at Roger. I would have done if I had been asked to try to come up with a fix though!