These graphics are generated by sampled 1,000 games (no handicap, 7.5 Komi) from baduk movies pro game collection
then choosing 1 random position from each game yielding 1,000 positions. I ask Minigo for it's policy and value output at each position and compare against the result of the real game.
For more methodolgy details (or general questions) leave a comment in this Google Sheet or checkout the code at tensorflow/minigo
The blue line is the evaluation at each model. The orange line is a moving average of the last 15 models
bottom graphs are simply zoomed in on last 40 models.
Much additional analysis has been done in Google Colab.
501001502002503003504004505005506006507000.4450.4500.4550.4600.4650.4700.4750.4800.4850.4900.4950.5000.5050.5100.4450.4500.4550.4600.4650.4700.4750.4800.4850.4900.4950.5000.5050.510Model NumberAccuracyAccuracy in Prediction Moves from Pro Dataset 501001502002503003504004505005506006507000.180.190.200.210.220.230.240.250.260.180.190.200.210.220.230.240.250.26Model NumberValue ErrorValue Error in Outcome of Game from Pro Dataset v9-19x19v10-19x19v11-19x19v12-19x19v13-19x19v14-19x19v15-19x19v16-19x19v17-19x19click to toggle all1002003004005006007008009001,0001,1000.000.050.100.150.200.250.300.350.400.450.50Model NumberAccuracyAccuracy in Prediction Moves from Pro Dataset v9-19x19v10-19x19v11-19x19v12-19x19v13-19x19v14-19x19v15-19x19v16-19x19v17-19x19click to toggle all1002003004005006007008009001,0001,1000.180.200.220.240.260.280.300.320.340.36Model NumberValue ErrorValue Error in Outcome of Game from Pro Dataset