In the beginning there was
v3, a 9x9 run.
v2 and v1, if they exist, are lost to history.
After v3 there was
v5. Note: we seem unable to start two runs
in a row so basically half the numbers are missing
Little is known about
v5,
the archives suggest it's a
10 block, 128 filter architecture,
5M games played.
Oral history handed down site admin to site admin tells that the operator tested
several learning rate changes near the end.
We all love Python it's a great language, but sometimes you crave speed.
v7 used a C++ binary to go
direct quote
"HyperSpeed".
v7 had its successes: better data marshalling, introduction of
Figure 3, bad resign rate graphs, ...
And its issues: We forgot to write sgfs, we cut the learning rate early, ...
It's better not to speak of v8 nor *shudder* mention its name
Gradients
v9 was a 20 layer model.
It was also the first model to train using the eight symmetries(?). Or was it?
"I physically feel sick" - AMJ upon discovering use_random_rotation
defaults to False three days in.
Never content, the MiniGo team pushed past "HyperSpeed" straight to
"PetaFlops Speed"
with
v10.
This was
the Real Deal a
20 layer, 256 filter
full sized model,
blazing on 640
Cloud TPUs.
I consider this our most serious attempt to reproduce AlphaZero:
We used the published learning rate schedule, batch size, ... (TODO ANDREW).
Andrew valiantly monitored the bad resign rate agrresively keeping it below 5%.
Our eval showed this was a strong model, surpassing our previous top model, and reaching
pro strength (v7 may have too?).
I told Andrew
"Init to 0 is stupid".
Init to 0 means initializing a new node's value (Q) to 0 (an even position).
I said it then and I'll say it now, this is a bad idea, it leads a weird behavior:
MCTS explores all 361 moves before using a 2nd readout on the top policy node.
Still it's what the paper says and we expected it to fail quickly so we tested it.
TL;DR: v11 failed.
Win rate wasn't stable and bad resign
was impossible to control.
For
v12 we tested
reproducibility of our model.
We reverted the v11 changes and ran v10 again (we change virtual_loss=2).
virtual_loss is a parameter we use to speed up the model by batching 8 (or now 2)
positions and evaluating them at the same time.
TL;DR: v11 is similar to v10, this was a test of stability andbootstrap
conditions.
We didn't see any measureable differences so we feel good that our RL setup is stable.