These graphics are generated by sampled 1,000 games (no handicap, 7.5 Komi) from baduk movies pro game collection
then choosing 1 random position from each game yielding 1,000 positions. I ask Minigo for it's policy and value output at each position and compare against the result of the real game.
For more methodolgy details (or general questions) leave a comment in this Google Sheet or checkout the code at tensorflow/minigo
The blue line is the evaluation at each model. The orange line is a moving average of the last 15 models
bottom graphs are simply zoomed in on last 40 models.
Much additional analysis has been done in Google Colab.
1002003004005006007008009001,0000.000.050.100.150.200.250.300.350.400.450.500.550.000.050.100.150.200.250.300.350.400.450.500.55Model NumberAccuracyAccuracy in Prediction Moves from Pro Dataset 1002003004005006007008009001,0000.160.180.200.220.240.260.280.300.320.340.360.380.160.180.200.220.240.260.280.300.320.340.360.38Model NumberValue ErrorValue Error in Outcome of Game from Pro Dataset v9-19x19v10-19x19v11-19x19v12-19x19v13-19x19v14-19x19v15-19x19v16-19x19v17-19x19click to toggle all1002003004005006007008009001,0001,1000.000.050.100.150.200.250.300.350.400.450.50Model NumberAccuracyAccuracy in Prediction Moves from Pro Dataset v9-19x19v10-19x19v11-19x19v12-19x19v13-19x19v14-19x19v15-19x19v16-19x19v17-19x19click to toggle all1002003004005006007008009001,0001,1000.180.200.220.240.260.280.300.320.340.36Model NumberValue ErrorValue Error in Outcome of Game from Pro Dataset