4NCL Online

Venues, fixtures, teams and related matters.
Matthew Turner
Posts: 3600
Joined: Fri May 16, 2008 11:54 am

Re: 4NCL Online

Post by Matthew Turner » Sat Aug 15, 2020 8:31 am

Roger,
Naturally, if a significant number of players are cheating the resulting distribution will move away from a normal distribution, but this is irrelevant. What matters is whether the underlying distribution (without cheating) is normal. Your investigation seems to suggest that it might well be. That's encouraging, so it looks like it works then.

Roger Lancaster
Posts: 1906
Joined: Tue Mar 17, 2015 2:44 pm

Re: 4NCL Online

Post by Roger Lancaster » Sat Aug 15, 2020 10:55 am

Matt

Yes, broadly - but only broadly - agreed. One is left to speculate what the distribution for the online participants would have been, in the event that none had cheated. Let's leave aside the complication of different people [OTB participants versus online participants] and I think there are empirical reasons - eg. faster time control, more distractions - to suppose that the mean performance of online participants would be worse than that of OTB participants. I can't see any easy method of quantifying this but it's a 'known unknown' which might affect other lines of argument.

I would also expect the SD, essentially a measure of variability, to be higher for online participants. I think it's possible to demonstrate this although there's always the danger of a logical fallacy in one's reasoning. But here goes.

The online distribution [with cheating] shows just 58% in the 1>z>-1 band whereas one would expect an online distribution [without cheating] to have 68% in this band. So, if z were the appropriate SD for the online distribution [without cheating] then the cheating would appear to have resulted in the displacement of 10% of the data from the 1>z>-1 band to the z>3 band. But, since the z>3 band in the online distribution [with cheating] band contains less than 10% of the data, this cannot be so. Accordingly, the SD for the online distribution [without cheating] must be greater than z. That probably corresponds with what one might empirically have expected, with some people performing more erratically online.

My conclusion is simply that - even if one accepts that the data from which Alex starts is accurate, and not everyone has total faith in Ken Regan's calculations - the graph is somewhat misleading in that, through using z as the online SD, it exaggerates the number of probable cheats.

Matthew Turner
Posts: 3600
Joined: Fri May 16, 2008 11:54 am

Re: 4NCL Online

Post by Matthew Turner » Sat Aug 15, 2020 12:46 pm

So just to be clear, your conclusion is that you think more people play objectively 600 points more than their rating online as compared to OTB. It is an interesting theory, but I think it is more likely that people are 'missing' from the centre of the distribution because mouse slips.

Roger Lancaster
Posts: 1906
Joined: Tue Mar 17, 2015 2:44 pm

Re: 4NCL Online

Post by Roger Lancaster » Sat Aug 15, 2020 1:26 pm

No, just to be clear, that's definitely not what I'm saying.

I'm saying that the displayed distribution for online players [some of whom we all agree must be cheating] shows just 58% of values in the range 1>z>-1. If the distribution for online players [without cheating] possessed the same mean and SD as that for OTB players, then a Normal distribution would indicate 68% in that range. There's a 10% loss from that range when one moves from the 'no cheating' to 'with cheating' graph from which one might infer that those 10% are cheaters - why else have they vanished from the central range ? But that's unlikely, not only in itself but because the z>3 range contains fewer than 10% of the data. So the only explanation which appears to make sense is that the SD for non-cheating online players is somewhat higher than z, the SD for over-the-board players. In other words, the values for non-cheating online players will be more widely dispersed than for over-the-board players [something which, empirically, I find unsurprising] which goes some way towards explaining why they are disproportionately represented at the extremities of Alex's graph. It's not a rebuttal of his thesis, simply a reflection that the graph somewhat misrepresents the scale of the problem.

Matthew Turner
Posts: 3600
Joined: Fri May 16, 2008 11:54 am

Re: 4NCL Online

Post by Matthew Turner » Sat Aug 15, 2020 1:37 pm

Roger,
I beg to differ, but this is what you are saying
"So just to be clear, your conclusion is that you think more people play objectively 600 points more than their rating online as compared to OTB. It is an interesting theory, but I think it is more likely that people are 'missing' from the centre of the distribution because of mouse slips."

Roger Lancaster
Posts: 1906
Joined: Tue Mar 17, 2015 2:44 pm

Re: 4NCL Online

Post by Roger Lancaster » Sat Aug 15, 2020 1:58 pm

Matt

I can't actually work out how you arrive at that conclusion which is neither what I've said nor what I believe. The gist of Alex's article was that a number of online players cheat and thereby get much better online results than appears plausible from their over-the-board performances, whether 600 points better or some other figure. I disagree only to the extent as to which this happens - and the gist of my contributions here has been to say that I believe Alex's graphs exaggerate the true nature of the problem.

Your point about mouse slips could well also be a cause for online players' performance showing a higher SD but that's a different matter.

Matthew Turner
Posts: 3600
Joined: Fri May 16, 2008 11:54 am

Re: 4NCL Online

Post by Matthew Turner » Sat Aug 15, 2020 3:55 pm

Roger,
You have observed that there are less players in the middle of the distribution than is predicted. I contend that this is because 1. A significant number of player cheat 2. Players underperform because of mouseslips, or perhaps because at home they have a couple of glasses of Chardonnay whilst playing.
You contend that the data is more spread out because players are more likely to objectively perform 600 points above their rating online. I am sure people are able to assess for themselves which explanation is most likely.

Roger Lancaster
Posts: 1906
Joined: Tue Mar 17, 2015 2:44 pm

Re: 4NCL Online

Post by Roger Lancaster » Sat Aug 15, 2020 5:28 pm

Matt, the contention that I have said, or believe, that "the data is more spread out because players are more likely to objectively perform 600 points above their rating online" is a fictional product of your imagination and I do not intend to keep on rebutting it here ad nauseam.

These accusations appear to be a tactic to divert attention from the message from my first post which observed that, while the CHESS article was well received, it contained a graph which exaggerated the extent of cheating in online 4NCL - presumably something you wish to suppress.

Paul Cooksey
Posts: 1519
Joined: Fri Oct 21, 2016 4:15 pm

Re: 4NCL Online

Post by Paul Cooksey » Sat Aug 15, 2020 5:46 pm

Roger, I also thought you were arguing that the data showed a greater variation in results online. Is that incorrect?

(I gave up maths at 18, and I didn't even do stats A-level since I was advised it was for the thick kids...)

Matthew Turner
Posts: 3600
Joined: Fri May 16, 2008 11:54 am

Re: 4NCL Online

Post by Matthew Turner » Sat Aug 15, 2020 6:10 pm

Roger,
I have no desire to suppress anything, but it is a bit frustrating that you don't seem to understand the point you are making. If you believe that the data is more spread out in online chess then you believe that it is more likely that a player will objectively outperform their rating by 600 points online as compared to over the board. Why choose 600 points, well that corresponds to a Z score of 3 over a 5 or 6 games. We could choose 300 points if you like, but the point still stands.

John McKenna

Re: 4NCL Online

Post by John McKenna » Sat Aug 15, 2020 7:03 pm

Roger Lancaster wrote:
Sat Aug 15, 2020 8:10 am
John McKenna wrote:
Fri Aug 14, 2020 9:23 pm
"... whether a Normal distribution is likely to give a reliable result when it is most unlikely that the data actually conforms to a Normal distribution."

I believe whether or not a data set conforms to a Normal distribution can be tested for.

The extract reproduced by John was intended as a maxim of general application. In the particular case of the online data, it's part of my case that it patently doesn't conform.
Thanks, Roger, but whether we are discussing a "general application... " or a "particular case... " any and all datasets can be tested (in a variety of ways).

Here we are not in possession of the raw data and so cannot be sure of its statistical properties.

Therefore I, for one, am inclined to believe that the objections you have raised have either been taken into acccount, or can be accounted for, by those who used the data to draw their statistical graph and conclusions.

Unfortunately said persons are unlikely to post here and debate the matter because it is beyond the scope of this forum.

Some truths are evident -

The scope for "cheating" in online chess is far, far greater than it is in over-the-board chess. Therefore incidences of it will be a significantly higher.

The ability to catch "cheats" online must be predicated on statistical methods and s/w tools in the first place if all levels of performance are to be checked without wasting valuable human effort that should be saved for later in the evaluation and appeals processes.

In fact, as has been evident already on the forum, it is in those latter stages of the processes that the main problens lie. The statistical precursors to the working of the online wheels of justice are a just a more tranparent police force to a succeding opaque court system.

In such a system, once arrested, it is necessary to have the advocates and connections to win your case. (Again, that had been evidenced here on the forum anecdotally in some interesting cases.)
Last edited by John McKenna on Sun Aug 16, 2020 1:07 pm, edited 2 times in total.

Roger Lancaster
Posts: 1906
Joined: Tue Mar 17, 2015 2:44 pm

Re: 4NCL Online

Post by Roger Lancaster » Sat Aug 15, 2020 7:13 pm

Paul, part of my argument was that the online data showed - as I felt might intuitively have been expected - a greater dispersion [variously known also as variation, scatter or spread] than the over-the-board data. The intuitive reasons were more or less those mentioned by Matt - primarily distractions of one sort or another in a home environment as opposed to the atmosphere of a 4NCL hotel and, perhaps, less commitment in some cases.

The complication was this. Essentially one can think of three separate graphs:
A] Over-the-board performance [assuming no cheating occurs]
B] Online performance [assuming no cheating occurs]
C] Online performance [actual results, including presumed cheating]

The one thing that can be said with certainty is that both the mean and the standard deviation [SD] will, because cheating produces outlier results at the positive end of the scale, be higher for C than for B. However, what one would ideally like to know, as a starting-point for any further thoughts, was how A and B [and not A and C] compared - in other words, whether OTB performance was objectively better than [and also objectively more variable than] online performance, cheating having been taken out of the equation

Alex included A and C but couldn't include B for the simple and obvious reason that no such raw data existed. The question then arose, how to redistribute some or all of the extreme data in C more centrally so as to try to arrive at B - in other words, how to attempt to nullify the effects of cheating on online performance. In this, I was greatly assisted by the knowledge that I should be aiming for a Normal distribution. The conclusion I reached was that, in order for A and B to have the same SD, only a relatively small proportion of the extreme data from C could be redistributed. This, in my opinion, although I am open to reasoned argument on this point, meant that B must have a larger SD than A which didn't come - see first para - as any great surprise to me.

The point of all this is that, if online performance naturally [that is, in an environment where no cheating occurs] has a larger SD than over-the-board performance [again assuming no cheating], then this goes some way towards explaining why there are proportionately more online outliers in Alex's graph. As I've said before, it doesn't contradict his assertion that there are plenty of cheats but suggests there may be rather fewer than he supposes. Sorry if that's a somewhat convoluted explanation but I hope it's of some help.

Roger Lancaster
Posts: 1906
Joined: Tue Mar 17, 2015 2:44 pm

Re: 4NCL Online

Post by Roger Lancaster » Sat Aug 15, 2020 7:48 pm

John, the most obvious flaw with the original graph is that it treats two distributions as having the same mean and SD despite the fact that, while one has 68% of the data falling within the 1>z>-1 range as expected for a Normal distribution, the other has only 58%. All else follows from that.

John McKenna

Re: 4NCL Online

Post by John McKenna » Sat Aug 15, 2020 10:59 pm

Thanks for the above reminder, Roger.

As I said, I find it hard to believe the people responsible for the underlying statistical treatments of the data available to them - both in the 4NCL and the wider online & otb chess world - have somehow got it basically wrong.

There have been some criticisms, over the years, of Prof. A. Elo's methodology pertaining to the rating of chessplayers. Yet, the USCF's & FIDE's implementations of his sytem have continued to be developed and used until today without that many qualms and nobody has managed to replace it with an alternative (although Jeff Sonas has one.)

I think there is no alternative at present than to trust the current anti-cheating methodology & s/w. Validating it is an important matter, but it is difficult to do on a primarily chess forum. Acting on it in terms of who and how to sanction is another matter.

Roger Lancaster
Posts: 1906
Joined: Tue Mar 17, 2015 2:44 pm

Re: 4NCL Online

Post by Roger Lancaster » Sun Aug 16, 2020 2:43 am

John McKenna wrote:
Sat Aug 15, 2020 10:59 pm

As I said, I find it hard to believe the people responsible for the underlying statistical treatments of the data available to them - both in the 4NCL and the wider online & otb chess world - have somehow got it basically wrong.
John, I don't want to dwell on this forever but my comments yesterday related to one specific graph which was the work of a single human or small group of humans rather than endorsed by "the wider online & otb chess world".

Post Reply