Cloudy Go README/RESULTS

Andrew hasn't updated the MiniGo RESULTS.md in a long time, so here goes...

run board_size blocks filters played on Number of models Number of games
v17 19 20 256 X00 TPU
Running: Using new Squeeze-And-Excitation network
v16 19 40 256 X00 TPU 1186 29.3M
Meh: Increased blocksize to 40. Didn't result in strength gain on equal playouts, likely worse when played with time parity.
v15 19 20 256 X00 TPU 1008 25.6M
Big Success: Using our run bigtable pipeline this was our fastest start. Changing to init Q to loss made v14 stronger so we started v15 with init Q to loss. This run got stronger than our previous runs. At the end of the run We played a series of games with LZ (going 50-50 vs LZ201) and ELF (40-60% winrate depending on model) which was awesome.
v14 19 20 256 X00 TPU
Started as a test of our new data pipeline using Cloud Bigtable, Around model 475 Andrew changed from using init Q to parent to init Q to loss. This seems to have made a postitive impact on policy and value sharpness, similiar to what's seen in ELF.
v13 19 21 256 X00 TPU 704 23.0M
Successes: Started from Supervised Model similar to AlphaGo Master.
v12 19 20 256 X00 TPU 1000 (exactly) 24.6M
Successes: Reproducability of v11: proved RL is stable.
v11 19 20 256 X00 TPU 171 6.6M
Successes: Tested an experiment.
Failures: Init to Q was very unstable and we stopped the run early.
v10 19 20 256 X00 TPU 865 22.3M
Successes: Ran on TPU (very fast), We "finished" the run.
v9 19 20 128 TPU 737 14.0M
Successes: Ran on TPU (very fast), We "finished" the run. Learned about the important of random rotation.
v8 19 20 256 TPU 5 100K
Successes: Proved our TPU kubernetes cluster works.
v7 19 20 128 GPU 529 7.8M
Successes: Golden Chunks for training, random rotation for training
Failures: Forgot to write sgfs for start of run
v5 19 20 128 GPU 581 4.8M
Successes: GPU cluster, Strong Amatuer
v3 9x9 10 32 CPU 496 3.3M
Successes: Code all ran and model trained
In the beginning there was v3, a 9x9 run. v2 and v1, if they exist, are lost to history.
After v3 there was v5. Note: we seem unable to start two runs in a row so basically half the numbers are missing

Little is known about v5, the archives suggest it's a 10 block, 128 filter architecture, 5M games played.
Oral history handed down site admin to site admin tells that the operator tested several learning rate changes near the end.

We all love Python it's a great language, but sometimes you crave speed. v7 used a C++ binary to go direct quote "HyperSpeed".
v7 had its successes: better data marshalling, introduction of Figure 3, bad resign rate graphs, ...
And its issues: We forgot to write sgfs, we cut the learning rate early, ...

It's better not to speak of v8 nor *shudder* mention its name Gradients

v9 was a 20 layer model. It was also the first model to train using the eight symmetries(?). Or was it?
"I physically feel sick" - AMJ upon discovering use_random_rotation defaults to False three days in.

Never content, the MiniGo team pushed past "HyperSpeed" straight to "PetaFlops Speed" with v10.
This was the Real Deal a 20 layer, 256 filter full sized model, blazing on 640 Cloud TPUs.
I consider this our most serious attempt to reproduce AlphaZero:
We used the published learning rate schedule, batch size, ... (TODO ANDREW).
Andrew valiantly monitored the bad resign rate agrresively keeping it below 5%.
Our eval showed this was a strong model, surpassing our previous top model, and reaching pro strength (v7 may have too?).

I told Andrew "Init to 0 is stupid".
Init to 0 means initializing a new node's value (Q) to 0 (an even position).
I said it then and I'll say it now, this is a bad idea, it leads a weird behavior:
MCTS explores all 361 moves before using a 2nd readout on the top policy node.
Still it's what the paper says and we expected it to fail quickly so we tested it.
TL;DR: v11 failed. Win rate wasn't stable and bad resign was impossible to control.

For v12 we tested reproducibility of our model.
We reverted the v11 changes and ran v10 again (we change virtual_loss=2).
virtual_loss is a parameter we use to speed up the model by batching 8 (or now 2) positions and evaluating them at the same time.
TL;DR: v11 is similar to v10, this was a test of stability andbootstrap conditions.
We didn't see any measureable differences so we feel good that our RL setup is stable.





Sad stuff