MiniGo Results

run	board_size	blocks	filters	played on	Number of models	Number of games
v17	19	20	256	X00 TPU
Running: Using new Squeeze-And-Excitation network
v16	19	40	256	X00 TPU	1186	29.3M
Meh: Increased blocksize to 40. Didn't result in strength gain on equal playouts, likely worse when played with time parity.
v15	19	20	256	X00 TPU	1008	25.6M
Big Success: Using our run bigtable pipeline this was our fastest start. Changing to init Q to loss made v14 stronger so we started v15 with init Q to loss. This run got stronger than our previous runs. At the end of the run We played a series of games with LZ (going 50-50 vs LZ201) and ELF (40-60% winrate depending on model) which was awesome.
v14	19	20	256	X00 TPU
Started as a test of our new data pipeline using Cloud Bigtable, Around model 475 Andrew changed from using init Q to parent to init Q to loss. This seems to have made a postitive impact on policy and value sharpness, similiar to what's seen in ELF.
v13	19	21	256	X00 TPU	704	23.0M
Successes: Started from Supervised Model similar to AlphaGo Master.
v12	19	20	256	X00 TPU	1000 (exactly)	24.6M
Successes: Reproducability of v11: proved RL is stable.
v11	19	20	256	X00 TPU	171	6.6M
Successes: Tested an experiment.
Failures: Init to Q was very unstable and we stopped the run early.
v10	19	20	256	X00 TPU	865	22.3M
Successes: Ran on TPU (very fast), We "finished" the run.
v9	19	20	128	TPU	737	14.0M
Successes: Ran on TPU (very fast), We "finished" the run. Learned about the important of random rotation.
v8	19	20	256	TPU	5	100K
Successes: Proved our TPU kubernetes cluster works.
v7	19	20	128	GPU	529	7.8M
Successes: Golden Chunks for training, random rotation for training
Failures: Forgot to write sgfs for start of run
v5	19	20	128	GPU	581	4.8M
Successes: GPU cluster, Strong Amatuer
v3	9x9	10	32	CPU	496	3.3M
Successes: Code all ran and model trained

In the beginning there was v3, a 9x9 run. v2 and v1, if they exist, are lost to history.
After v3 there was v5. Note: we seem unable to start two runs in a row so basically half the numbers are missing

Little is known about v5, the archives suggest it's a 10 block, 128 filter architecture, 5M games played.
Oral history handed down site admin to site admin tells that the operator tested several learning rate changes near the end.

We all love Python it's a great language, but sometimes you crave speed. v7 used a C++ binary to go direct quote "HyperSpeed".
v7 had its successes: better data marshalling, introduction of Figure 3, bad resign rate graphs, ...
And its issues: We forgot to write sgfs, we cut the learning rate early, ...

It's better not to speak of v8 nor *shudder* mention its name Gradients

v9 was a 20 layer model. It was also the first model to train using the eight symmetries(?). Or was it?
"I physically feel sick" - AMJ upon discovering use_random_rotation defaults to False three days in.

Never content, the MiniGo team pushed past "HyperSpeed" straight to "PetaFlops Speed" with v10.
This was the Real Deal a 20 layer, 256 filter full sized model, blazing on 640 Cloud TPUs.
I consider this our most serious attempt to reproduce AlphaZero:
We used the published learning rate schedule, batch size, ... (TODO ANDREW).
Andrew valiantly monitored the bad resign rate agrresively keeping it below 5%.
Our eval showed this was a strong model, surpassing our previous top model, and reaching pro strength (v7 may have too?).

I told Andrew "Init to 0 is stupid".
Init to 0 means initializing a new node's value (Q) to 0 (an even position).
I said it then and I'll say it now, this is a bad idea, it leads a weird behavior:
MCTS explores all 361 moves before using a 2nd readout on the top policy node.
Still it's what the paper says and we expected it to fail quickly so we tested it.
TL;DR: v11 failed. Win rate wasn't stable and bad resign was impossible to control.

For v12 we tested reproducibility of our model.
We reverted the v11 changes and ran v10 again (we change virtual_loss=2).
virtual_loss is a parameter we use to speed up the model by batching 8 (or now 2) positions and evaluating them at the same time.
TL;DR: v11 is similar to v10, this was a test of stability andbootstrap conditions.
We didn't see any measureable differences so we feel good that our RL setup is stable.

Cloudy Go README/RESULTS