Some thoughts on anti-cheating systems

Paul McKeown · Post by **Paul McKeown** » Mon Jun 01, 2020 1:12 am

NickFaulks wrote: ↑
Sun May 31, 2020 11:32 pm

Joseph Conlon wrote: ↑
Sun May 31, 2020 11:07 pm
1. For all the talk of secrecy, lichess is open-source. The anti-cheating software is at https://github.com/clarkerubber/irwin
Can someone who understands these things confirm that Lichess really do reveal their anti-cheating methods here? If so, why do they demand such a ferocious NDA?

It's a neural net, which needs to be trained by examining games:

Build a database of analysed players

If you do not already have a database of analysed players, it will be necessary to analyse a few hundred players to train the neural networks on. python3 main.py --no-assess --no-report
About

Irwin (named after Steve Irwin, the Crocodile Hunter) started as the name of the server that the original cheatnet ran on (now deprecated). This is the successor to cheatnet.

Similar to cheatnet, it works on a similar concept of analysing the available PVs of a game to determine the odds of cheating occurring.

This bot makes improvements over cheatnet by taking a dramatically more modular approach to software design. modules/core contains most of the generic datatypes, BSON serialisation handlers and database interface layers. It is also significantly faster due to a simplified approach to using stockfish analysis.

modules/irwin contains the brains of irwin, this is where the tensorflow learning and application takes place.

Irwin has been designed so that modules/irwin can be replaced with other approaches to player assessment.

Env.py contains all of the tools to interact with lichess, irwin, and the database handlers.

main.py covers accessing the lichess API (modules/Api.py) via Env to get player data; pulling records from mongodb, analysing games using stockfish, assessing those games using tensorflow and then posting the final assessments.

How reliable that neural net actually is at determining moves played through assistance versus moves played through accepted methods is moot.

They won't after all have much in the way of independent corroboration.

So the suspicion will remain that they will be making decisions, "because our neural net says so, and neural net is a weighty term to use in any conversation."

MartinCarpenter · Post by **MartinCarpenter** » Mon Jun 01, 2020 9:12 am

The problem isn't so much that it is moot as that no one will ever be able to know.

NN's are brilliant in some ways but all they do is to latch onto all of the correlations in the data, causative/meaningful or not. Those then get trained down to an internal representation so that absolutely no one can actually say why it has made the decision it did.
(Very active field of research to improve this.).

Most worringly, because they've got no real judgement, when they make do a (often rare) mistake it isn't say being 5% out. They're often enormous and daft.

They're also very sensitive to over training on patterns in the training set - and I'm really unsure how large/good their training set can possibly be here, given you need games played by known cheaters - that then don't generalise well to the real world.

The statistical stuff - used right - is solid and reliable, this looks very like the sort of thing an enthusiast in an open source project would program as it seems a fun challenge without stopping to think if its genuinely a good idea.

MartinCarpenter · Post by **MartinCarpenter** » Mon Jun 01, 2020 9:27 am

Joseph Conlon wrote: ↑
Sun May 31, 2020 11:07 pm
There's a lot here (and thanks to Roger for his thoughtful starting post even though I broadly disagree). Some comments:

3. In principle you would expect lichess/chess.com to have much better cheat detection systems than Ken Regan as they have far more data to work with (not least move times). i.e. if someone is cheating in a games, the moves by themselves would give you a certain amount of statistical significance, but moves plus move times should give you more.

Only if move times are actually useful information, which I'm not quite sure about. There's a lot of reasons that peoples move times might be a bit random, and only one we actually want to catch (consulting an engine.).

Have anyone got a large set of verified move timings from people doing that? Over all the very different time limits?

If they're consulting an engine then the moves will tell you soon enough.

Matthew Turner · Post by **Matthew Turner** » Mon Jun 01, 2020 9:37 am

One very quick observation - there is nothing there about rating, this seem to be a way of trying to measure your objective standard of play and not relative to rating.

NickFaulks · Post by **NickFaulks** » Mon Jun 01, 2020 9:40 am

Adam Raoof wrote: ↑
Sun May 31, 2020 1:49 pm
I did actually ask Ken

.....

(ii) it operates with quantities that demonstrably conform well to normal distribution, so that z-scores rather than more generic “p-values” can be given up-front."

That sums up the whole discussion. Most people on this forum seem to believe that this demonstration exists and is convincing, but I cannot find anyone who has actually seen it.

Joseph Conlon · Post by **Joseph Conlon** » Mon Jun 01, 2020 9:53 am

MartinCarpenter wrote: ↑
Mon Jun 01, 2020 9:27 am
Only if move times are actually useful information, which I'm not quite sure about. There's a lot of reasons that peoples move times might be a bit random, and only one we actually want to catch (consulting an engine.).

Have anyone got a large set of verified move timings from people doing that? Over all the very different time limits?

If they're consulting an engine then the moves will tell you soon enough.

I'm not sure. Personally, I feel one of the strong 'tells' of engine use is the player who plays a complicated middle game accurately at (say) 5 seconds a move and then continues to spend 5 seconds a move once in an easily won endgame where they are significant material up (in contrast to spending more time on 'critical' positions and being able to quickly finish off won positions). For any single game there may be reasons why this happens, but spread over many games this becomes a statistical feature.

Certainly it would not be hard for online sites to have a very large sample of games where based on moves alone one player was determined to use engine assistance, and then look at the move timings within that.

MartinCarpenter · Post by **MartinCarpenter** » Mon Jun 01, 2020 10:20 am

Joseph Conlon wrote: ↑
Mon Jun 01, 2020 9:53 am

MartinCarpenter wrote: ↑
Mon Jun 01, 2020 9:27 am
Only if move times are actually useful information, which I'm not quite sure about. There's a lot of reasons that peoples move times might be a bit random, and only one we actually want to catch (consulting an engine.).

Have anyone got a large set of verified move timings from people doing that? Over all the very different time limits?

If they're consulting an engine then the moves will tell you soon enough.
I'm not sure. Personally, I feel one of the strong 'tells' of engine use is the player who plays a complicated middle game accurately at (say) 5 seconds a move and then continues to spend 5 seconds a move once in an easily won endgame where they are significant material up (in contrast to spending more time on 'critical' positions and being able to quickly finish off won positions). For any single game there may be reasons why this happens, but spread over many games this becomes a statistical feature.

I'm not honestly convinced by that a priori. It definitely isn't how you'd program an engine to work at some blitz time limits, especially anything without increment.

Obviously we're not (I hope?!) dealing with direct engine move inputs here, its someone copying stuff across and back again, that process will produce random timing fluctuations.

People, as a population, definitely naturally play at quite different tempos.

Joseph Conlon wrote: ↑
Mon Jun 01, 2020 9:53 am
Certainly it would not be hard for online sites to have a very large sample of games where based on moves alone one player was determined to use engine assistance, and then look at the move timings within that.

That last would definitely be possible, I suppose, if obviously not fully ideal.

Matthew Turner · Post by **Matthew Turner** » Mon Jun 01, 2020 10:30 am

NickFaulks wrote: ↑
Mon Jun 01, 2020 9:40 am

Adam Raoof wrote: ↑
Sun May 31, 2020 1:49 pm
I did actually ask Ken

.....

(ii) it operates with quantities that demonstrably conform well to normal distribution, so that z-scores rather than more generic “p-values” can be given up-front."
That sums up the whole discussion. Most people on this forum seem to believe that this demonstration exists and is convincing, but I cannot find anyone who has actually seen it.

Nick,
I have a question for you. The idea is that chess is modeled in the same way as a multiple choice exam. Would you accept that if I took results from the 2004 GCSE Maths exam (in terms of marks rather than grades) that they would accurately approximate to a normal distribution?
Matt

MartinCarpenter · Post by **MartinCarpenter** » Mon Jun 01, 2020 10:55 am

Well, what if you gave a few of the students the answers to try and remember before hand?

NickFaulks · Post by **NickFaulks** » Mon Jun 01, 2020 10:59 am

Matthew Turner wrote: ↑
Mon Jun 01, 2020 10:30 am
Nick,
I have a question for you. The idea is that chess is modeled in the same way as a multiple choice exam. Would you accept that if I took results from the 2004 GCSE Maths exam (in terms of marks rather than grades) that they would accurately approximate to a normal distribution?
Matt

Of course not, but what has that got to do with anything?

There is no reason why Ken's p-values should be normally distributed across the spectrum, but they may be very close and that's good enough for me. Why can't we just take a look and then put the whole thing to bed?

Matthew Turner · Post by **Matthew Turner** » Mon Jun 01, 2020 11:22 am

Nick,
Sorry are you saying that the results from GCSE Maths wouldn't be normally distributed? Why not? I am open to the idea that wouldn't be the case, but I don't know the reason.

Martin,
Good question. We'd have more outliers in the upper band than we would expect, so we'd know that some people were cheating. Could we identify the cheats from the gifted academics well that is another question!

MartinCarpenter · Post by **MartinCarpenter** » Mon Jun 01, 2020 11:44 am

Except the ones that look beforehand are somehow legal, and the others looking during the exam aren't

Probably also extra extreme negative events where you play when ill, distracted etc and everything you try to do is rubbish. It is definitely messier than every move being a purely random selection of move quality from a given underlying distribution.

It'll trend close enough to normal once aggregated up of course. Just another reason to be a little cautious with the numbers.

NickFaulks · Post by **NickFaulks** » Mon Jun 01, 2020 11:46 am

The technical reason is that the Central Limit Theorem, in its simplest form, is based on adding independent variables. Whether you get Q6 of a maths exam right is not independent of whether you get Q7 right - they will have a clear tendency to go together.

In practical terms, let's suppose we find a mean of 70/100 and a sd of 20. Wouldn't you expect a cluster in the 90-100 range, these being the children who can do sums? There will be a long tail at the other end, depending on whether you get punished for guessing wrong.

In the same way, I would not expect moves 16 and 17 in a game of chess to be independent. If you are following a consistent plan which the computer likes they will both be good, if you are following one it doesn't like they will both be bad. [ This is why I don't like going over my games with an engine - so often it seems to be telling me "I wouldn't have started from here" ]. I have no idea how serious this is and would like to know.

Roger de Coverly · Post by **Roger de Coverly** » Mon Jun 01, 2020 12:14 pm

NickFaulks wrote: ↑
Mon Jun 01, 2020 11:46 am
I have no idea how serious this is and would like to know.

Don't you need a lot of observations to reject the hypothesis that something isn't just random? Take a game of 30 moves. The first 10 are theory, so following theory just establishes that the player knows some. If the next twenty moves match an engine choice that doesn't have to be twenty points of evidence, it might be as few as two or three. I believe the Regan method attempts to allow for this by filtering out moves that are forced and possibly also "only" moves. It does make a difference. If you threw 20 heads in a row, you might conclude as a strong hypothesis that the coin had a bias, you wouldn't on three.

As regards the maths exam wouldn't it depend on the quality of the examinees and the difficulty of the questions? High quality of candidates coupled with easy questions could see marks only in the 80% to 100% range.

Matthew Turner · Post by **Matthew Turner** » Mon Jun 01, 2020 12:20 pm

NickFaulks wrote: ↑
Mon Jun 01, 2020 11:46 am
The technical reason is that the Central Limit Theorem, in its simplest form, is based on adding independent variables. Whether you get Q6 of a maths exam right is not independent of whether you get Q7 right - they will have a clear tendency to go together.

In practical terms, let's suppose we find a mean of 70/100 and a sd of 20. Wouldn't you expect a cluster in the 90-100 range, these being the children who can do sums? There will be a long tail at the other end, depending on whether you get punished for guessing wrong.

In the same way, I would not expect moves 16 and 17 in a game of chess to be independent. If you are following a consistent plan which the computer likes they will both be good, if you are following one it doesn't like they will both be bad. [ This is why I don't like going over my games with an engine - so often it seems to be telling me "I wouldn't have started from here" ]. I have no idea how serious this is and would like to know.

Nick,
Ken Regan explained how he'd taken account of this with a dependence matrix, but it just doesn't feel that significant to me. If you have enough chunks of data then this will even itself out (assuming the exam isn't ridiculously easy or ridiculously hard). That is why we exclude chess moves where one side has an overwhelming advantage.
In the example that you give I don't agree there would be a cluster at 90-100. I think there would be a lot more at 85 than 95 and a lot more at 75 than 85.
What I would say is that I think the answers in a Maths exam are more inter-dependent than the moves in a chess game so maybe 100 marks isn't enough to get a (close approximation to the) normal distribution. It seems to me that this is really only about basing a judgment on the right amount of information not on the process itself.

English Chess Forum

Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems

Re: Some thoughts on anti-cheating systems