Chess Player Strip Searched

Roger de Coverly · Post by **Roger de Coverly** » Thu Jan 17, 2013 3:29 pm

Geoff Chandler wrote: That link to the GM petition proves that the top players agree that there is something
going on and FIDE need to act before the OTB game at the very top level is destroyed by paranoia.

It's reasonably obvious that a online correspondence site facilitates the consultation of computers between moves. It's legal even on RHP to consult databases and published theory, including online theory. So if you want as well to ban usage of computers as blunder checks, that's down to the site. ICCF don't bother. If you want to detect players entering moves from the engine of their choice, you need measures for that as well. The very valid point being that you don't go to a site just to play a computer engine.

Given that OTB chess has always had, adjournments excepted, a prohibition against seeking external advice, you need first to establish the nature of the external advice before throwing accusations against any player daring to beat a Grandmaster or two. The Feller case was a wake up call, given the presumed low tech nature of communication, as it's a method that doesn't rely on computers for transmission of the advice. The difference now, compared to the past, is that the advice is likely to be of a higher quality.

Steve Collyer · Post by **Steve Collyer** » Thu Jan 17, 2013 3:31 pm

Engine outputs vary somewhat.
It's not a case of
13 out of 20 top 1 match
16 out of 20 top 2 match
18 out of 20 top 3 match

In practice, it's more a case of
520 out of 800 top 1 match
640 out of 800 top 2 match
720 out of 800 top 3 match

I wouldn't get too obsessed with the 65% top 1 match indicating obvious cheating. As you know, engine analysis does vary somewhat between engines, the player may be deluding themselves of their own input to their games by not blindly selecting the 1st choice move all the time and there are probably several other causes I've forgotten, but the main point is that a player matching 65/80/90% for top 1, 2 & 3 engine choice moves in a large sample of games does not play stronger chess than Fischer, Kasparov, Kramnik etc etc, but crucially far more engine-like chess than any of the all-time greats when analysed using the same approach.

On chess.com a couple of statisticians had severe reservations about the methodology as you do.
They analysed hundreds of games themselves & came to the same conclusions I have - the system works.
One of them called Gerd actually published his own auto-analysis tool:
http://www.chess.com/download/view/chessanalyse-26
another, a statistician & programmer called Kris refined the Batch Analyzer program output to eliminate obvious/forced moves from benchmarks & suspect cheat analysis.

In other words, the more the cynical statisticians looked at the methodology, the more it made sense.

Roger de Coverly · Post by **Roger de Coverly** » Thu Jan 17, 2013 3:50 pm

Steve Collyer wrote: In other words, the more the cynical statisticians looked at the methodology, the more it made sense.

I'm just using general reasoning rather than statistics.

Engine A analyses old games and finds 13 out of 20 moves match its first choice. That means 7 from 20 don't match. So check with engines B and C. They also find 13 out of 20 matching. But are they the same moves? If they aren't, then the match up rate is that much higher. The method of only using one engine is making the tacit assumption that all engines choose the same move and failing to select the human move in the 7 from 20 cases is evidence of cheating.

MartinCarpenter · Post by **MartinCarpenter** » Thu Jan 17, 2013 4:01 pm

I'm not sure if over the board cheating move patterns will work quite the same - you've got much, much less useful communication possible with the computer. Even reliably transmitting one move isn't trivial but you'd certainly never more than one option.

Roger de Coverly · Post by **Roger de Coverly** » Thu Jan 17, 2013 4:03 pm

Steve Collyer wrote: but crucially far more engine-like chess than any of the all-time greats when analysed using the same approach.

If played over the board without external assistance, that isn't cheating, however much the engine match advocates might wish it to be so.

It's a point Matthew Turner made. If you analyse previous games with engines, over time you get a feel for what they recommend and how they evaluate positions. So if you have lost a pawn or sacrificed one for activity, you suspect from checking similar positions with an engine that you have adequate compensation.

Hair raising tactical sequences arising in early middle games may well be engine generated, but these are worked out prior to the game rather than while the game is taking place. It's suspected the recent Wijk Aronian- Anand game was unused preparation from the Gelfand match.

Steve Collyer · Post by **Steve Collyer** » Thu Jan 17, 2013 4:09 pm

Roger de Coverly wrote:
Steve Collyer wrote: In other words, the more the cynical statisticians looked at the methodology, the more it made sense.
I'm just using general reasoning rather than statistics.

Engine A analyses old games and finds 13 out of 20 moves match its first choice. That means 7 from 20 don't match. So check with engines B and C. They also find 13 out of 20 matching. But are they the same moves? If they aren't, then the match up rate is that much higher. The method of only using one engine is making the tacit assumption that all engines choose the same move and failing to select the human move in the 7 from 20 cases is evidence of cheating.

Of course different engines will often choose different 1st, 2nd, 3rd, 4th choice moves, although more often than not they will be the same moves although shuffled in order. This is because in many positions there may only be a few centipawns worth of difference in evals between the top 4 moves.
This problem is circumvented by having a large sample of analysed moves.
The same benchmark games have been analysed by different people using different engines on very different systems. The over all match rates are, as I said earlier, remarkably similar - hence the benchmark thresholds.
If what you are saying is true, then it stands to reason that the thresholds wouldn't exist as they do - different analyses of the same batches of games would be giving wildly different results. But they don't.

Steve Collyer · Post by **Steve Collyer** » Thu Jan 17, 2013 4:20 pm

Roger de Coverly wrote:
Steve Collyer wrote: but crucially far more engine-like chess than any of the all-time greats when analysed using the same approach.
If played over the board without external assistance, that isn't cheating, however much the engine match advocates might wish it to be so.

It's a point Matthew Turner made. If you analyse previous games with engines, over time you get a feel for what they recommend and how they evaluate positions. So if you have lost a pawn or sacrificed one for activity, you suspect from checking similar positions with an engine that you have adequate compensation.

Hair raising tactical sequences arising in early middle games may well be engine generated, but these are worked out prior to the game rather than while the game is taking place. It's suspected the recent Wijk Aronian- Anand game was unused preparation from the Gelfand match.

To have any meaningful impact on a large sample set, this engine vs engine scenario would have to occur on a regular basis. Also you need an opponent to kindly play the pre-analysed lines. Anyone who's ever read a modern opening book or used an engine will know how ridiculous this all is. 2 moves out of book & with the best preparation in the world the opponent can often play a thoroughly decent move which you haven't prepared for. You'd need to analyse/memorise hundreds of lines for this to have any impact. That & in the first instance the opponent needs to always play a reliable variation on an opening.
A sort of conspiracy of engine prep between you & your opponent, if you will.

How come Carlsen only managed
Top 1 Match: 477/828 ( 57.6% )
Top 2 Match: 617/828 ( 74.5% )
Top 3 Match: 690/828 ( 83.3% )
Top 4 Match: 732/828 ( 88.4% )
in the 20 most recent games vs 2600+ FIDE which all had more than 20 non-database moves?
Was his team not well-prepared? Does my copy of Houdini on a fast quad-core pc have a fault?
Maybe engines play a significantly different style to the best human players, as I said earlier?

This is a bit like an excuse some online cheats have used to excuse high match rates in all their games, regardless of opponent, opening, sub-optimal opposition moves etc "I have a private database of 6 million engine vs engine games which I referred to" total nonsense of course.

Also, you can't have it both ways, Roger.
First you say that maybe all this OTB stuff can be pre-analysed, leading to a false positive, then you say that engines surely give different moves as top choices.
Do top OTB players kindly tell each other what engines they're using on what spec systems & what openings they are prepping for?

Roger de Coverly · Post by **Roger de Coverly** » Thu Jan 17, 2013 4:31 pm

Steve Collyer wrote: If what you are saying is true, then it stands to reason that the thresholds wouldn't exist as they do - different analyses of the same batches of games would be giving wildly different results. But they don't.

I'm saying that the 65% benchmark is unconvincing because perhaps 85% to 90% of moves would be chosen by at least one engine. At that sort of level, I'm not really sure the test is particularly convincing for over the board play where cheating is unexpected at separating "strong player" from "strong player using an engine" from "weak player using an engine".

If you had a plausible cheating method, then a high match up is corroboration, particularly if you can reverse engineer the settings and program actually used.

MartinCarpenter · Post by **MartinCarpenter** » Thu Jan 17, 2013 4:32 pm

Carlsen not a great example for this as he very definitely doesn't go in for very forcing lines at all often. Topalov from a few years back (when he was doing so well) is the sort of player who could well be a little bit higher as he did get quite a few games with very deep, sharp preparation which got quite close to winning at times.

Roger de Coverly · Post by **Roger de Coverly** » Thu Jan 17, 2013 4:47 pm

Steve Collyer wrote: 2 moves out of book & with the best preparation in the world the opponent can often play a thoroughly decent move which you haven't prepared for.

Depends what you define by book. Even at my level 190/ 2050, you don't memorise opening theory unless you have to because there's only one tactical sequence. What you do is establish positional ideas such as where to place the pieces and what the possible tactics are. So you get games which are nominally original, but in practice just recycle previously seen ideas.

Steve Collyer wrote:
Maybe engines play a significantly different style to the best human players, as I said earlier?

Different engines have different styles. But that's the point, why use just the one engine as a benchmark?

Steve Collyer wrote: First you say that maybe all this OTB stuff can be pre-analysed, leading to a false positive, then you say that engines surely give different moves as top choices.

It's the same point. Basically if both humans and computer engines select good moves, the natural match up rates are rather higher than checking against a single engine gives credit for. So if you choose six engines, how many moves do you find by human players that none of the engines recommend? Is there any research on that point?

As far as the Carlsen analysis is concerned, you are saying that 351 out of 828 moves didn't match Houdini. But the question I am asking is how many of the 351 didn't match any computer engine? That may even be a fault of the computer engines, that Carlsen's judgement is in fact better than the logic programmed into the engines.

Steve Collyer · Post by **Steve Collyer** » Thu Jan 17, 2013 6:21 pm

MartinCarpenter wrote:Carlsen not a great example for this as he very definitely doesn't go in for very forcing lines at all often. Topalov from a few years back (when he was doing so well) is the sort of player who could well be a little bit higher as he did get quite a few games with very deep, sharp preparation which got quite close to winning at times.

Games of Topalov, along with all other Super GM's have already been analysed. They fall within the thresholds.

Steve Collyer · Post by **Steve Collyer** » Thu Jan 17, 2013 6:32 pm

As far as Carlsen not matching Houdini first choice more than 57.6% of the time, it's not surprising at all. Just analyse any great player's games with any engine in multi pv & you will see how often there are several reasonable moves all within a few centipawns of each other in most positions.
As I say, the best human players play to plans & don't think in terms of fractional score differences between moves at 20 ply depths!
What the testing is used for is to determine legitimate human achievable thresholds where an engine hasn't been used, then compare to the hundreds of unknown online geniuses who top server ratings charts.
When different analysts using different engines & systems independently get extremely similar benchmarks, you have to assume that the system works!
I really think you're arguing for arguments sake Roger. I've done this for over 5 years & believe it or not I do know what I'm talking about, at least when it comes to creating benchmarks with this methodology & applying those to online chess cheat suspects.

Steve Collyer · Post by **Steve Collyer** » Thu Jan 17, 2013 6:42 pm

I should also say that the reason I use Houdini for this analysis is that it is the strongest engine available & simply refuses to crash during batch analysis.
I've tried many engines, namely Deep Fritz, Deep Rybka, Shredder & a few others. The only engine that seems to occasionally give strange results (either high or low) is Stockfish.

If I used a weaker engine than Houdini, I wonder if you'd then be questioning the validity of my analysis as a result of not using the strongest freely available program? Just a thought...

Mick Norris · Post by **Mick Norris** » Thu Jan 17, 2013 6:58 pm

Steve Collyer wrote:I really think you're arguing for arguments sake Roger.

Roger de Coverly · Post by **Roger de Coverly** » Thu Jan 17, 2013 7:06 pm

Steve Collyer wrote: at least when it comes to creating benchmarks with this methodology & applying those to online chess cheat suspects.

We're not really talking about players using an engine in on-line blitz or blunder checking and more in on-line correspondence. If there are no witnesses, it's always a plausible hypothesis that there's an engine running, particularly if other on-line behaviour corroborates.

The hypothesis put forward by some writers is that in over the board chess, which takes place with witnesses, that a player can be suspected of cheating if his moves match an arbitrary computer engine more than 13 times from 20, or 650 out of 1000. The problem with this assertion is that you can demonstrate the 13 matching moves, but if you use other engines, you get a different 13 moves.

Steve Collyer wrote:As I say, the best human players play to plans & don't think in terms of fractional score differences between moves at 20 ply depths!

Is it not the case that different engines will also rank a different top move? So if top players only match 13 from 20 measured against a single engine, does that give meaningful information?

English Chess Forum

Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched

Re: Chess Player Strip Searched