These graphics are generated by sampled 1,000 games (no handicap, 7.5 Komi) from baduk movies pro game collection
then choosing 1 random position from each game yielding 1,000 positions. I ask Minigo for it's policy and value output at each position and compare against the result of the real game.
For more methodolgy details (or general questions) leave a comment in this Google Sheet or checkout the code at tensorflow/minigo
The blue line is the evaluation at each model. The orange line is a moving average of the last 15 models
bottom graphs are simply zoomed in on last 40 models.
Much additional analysis has been done in Google Colab.