Andrew hasn't updated the MiniGo in a long time, so here goes...

run board_size blocks filters played on Number of models Number of games
v13 19x19 20 256 X00 TPU
Successes: Started from Supervised Model similar to AlphaGo Master.
v12 19x19 20 256 X00 TPU 1000 (exactly) 24.6M
Successes: Reproducability of v11: proved RL is stable.
v11 19x19 20 256 X00 TPU 171 6.6M
Successes: Tested an experiment.
Failures: Init to Q was very unstable and we stopped the run early.
v10 19x19 20 256 X00 TPU 865 22.3M
Successes: Ran on TPU (very fast), We "finished" the run.
v9 19x19 20 128 TPU 737 14.0M
Successes: Ran on TPU (very fast), We "finished" the run. Learned about the important of random rotation.
v8 19x19 20 256 TPU 5 100K
Successes: Proved our TPU kubernetes cluster works.
v7 19x19 20?? 128 GPU 529 7.8M
Successes: Golden Chunks for training, random rotation for training
Failures: Forgot to write sgfs for start of run
v5 19x19 10 128 GPU 581 4.8M
Successes: GPU cluster, Strong Amatuer
v3 9x9 10? 64??? CPU 496 3.3M
Successes: Code all ran and model trained
In the beginning there was v3, a 9x9 run. v2 and v1, if they exist, are lost to history.
After v3 there was v5. Note: we seem unable to start two runs in a row so basically half the numbers are missing

Little is known about v5, the archives suggest it's a 10 block, 128 filter architecture, 5M games played.
Oral history handed down site admin to site admin tells that the operator tested several learning rate changes near the end.

We all love Python it's a great language, but sometimes you crave speed. v7 used a C++ binary to go direct quote "HyperSpeed".
v7 had its successes: better data marshalling, introduction of Figure 3, bad resign rate graphs, ...
And its issues: We forgot to write sgfs, we cut the learning rate early, ...

It's better not to speak of v8 nor *shudder* mention its name Gradients

v9 was a 20 layer model. It was also the first model to train using the eight symmetries(?). Or was it?
"I physically feel sick" - AMJ upon discovering use_random_rotation defaults to False three days in.

Never content, the MiniGo team pushed past "HyperSpeed" straight to "PetaFlops Speed" with v10.
This was the Real Deal a 20 layer, 256 filter full sized model, blazing on 640 Cloud TPUs.
I consider this our most serious attempt to reproduce AlphaZero:
We used the published learning rate schedule, batch size, ... (TODO ANDREW).
Andrew valiantly monitored the bad resign rate agrresively keeping it below 5%.
Our eval showed this was a strong model, surpassing our previous top model, and reaching pro strength (v7 may have too?).

I told Andrew "Init to 0 is stupid".
Init to 0 means initializing a new node's value (Q) to 0 (an even position).
I said it then and I'll say it now, this is a bad idea, it leads a weird behavior:
MCTS explores all 361 moves before using a 2nd readout on the top policy node.
Still it's what the paper says and we expected it to fail quickly so we tested it.
TL;DR: v11 failed. Win rate wasn't stable and bad resign was impossible to control.

For v12 we tested reproducibility of our model.
We reverted the v11 changes and ran v10 again (we change virtual_loss=2).
virtual_loss is a parameter we use to speed up the model by batching 8 (or now 2) positions and evaluating them at the same time.
TL;DR: v11 is similar to v10, this was a test of stability andbootstrap conditions.
We didn't see any measureable differences so we feel good that our RL setup is stable.

Sad stuff