His results were gorgeous and he got from scratch a superhuman (and super-classic AI) model. I've done an adaptation of Mokemokechicken's code to apply it to the game of Connect4: https://github.com/Zeta36/connect4-alpha-zero
Just with a CPU my model was able to play in a few hours an almost perfect game defeating all the online Connect4 games I've found in Internet.
I did also an adaptation to chess: https://github.com/Zeta36/chess-alpha-zero, but unfortunately I don't have GPU to train this more complex game. Anyway the code is there and it's functional. If somebody has a bored GPU It'd great to know if the chess adaptation is able to learn to play at least as a good amateur (It'd take probably at least a week even with a powerful GPU).
I'd like finally to point out the easy way in which the idea behind AlpgaGo Zero can be applied to a lot of other situations almost just by changing the environment model (state, action, reward).
First of all - wonderful projects. Really impressive that the idea can work in such a general way out of the box. However, to prevent over hyping over the thought of a superhuman level agent trained without massively strong hardware getting to these results so fast, I have to note the following:
@mokemokechicken 's model didn't reach super classic AI or even super human level yet. In fact, Even now, @mokemokechicken 's best model is struggling against a low level of a relatively mediocre program.
However, the very fact that seemingly substantial growth has been made without a distributed learning environment is definitely impressive.
You are right @grolich. In fact, I've added a distributed option in the code so we could make use of multiple machines working at the same time.
Also, I've just added a pre-training process using supervised learning games (with PGN file games) so we can help to the policy in the beginning before starting the self-play improvement. This is similar to what AlphaGo did in its original version.
Some related (Zero way) projects in Python:
Mokemokechicken did an wonderful adaptation of this methodology in the game of Reversi: https://github.com/mokemokechicken/reversi-alpha-zero
His results were gorgeous and he got from scratch a superhuman (and super-classic AI) model.
I've done an adaptation of Mokemokechicken's code to apply it to the game of Connect4: https://github.com/Zeta36/connect4-alpha-zero
Just with a CPU my model was able to play in a few hours an almost perfect game defeating all the online Connect4 games I've found in Internet.
I did also an adaptation to chess: https://github.com/Zeta36/chess-alpha-zero, but unfortunately I don't have GPU to train this more complex game. Anyway the code is there and it's functional. If somebody has a bored GPU It'd great to know if the chess adaptation is able to learn to play at least as a good amateur (It'd take probably at least a week even with a powerful GPU).
I'd like finally to point out the easy way in which the idea behind AlpgaGo Zero can be applied to a lot of other situations almost just by changing the environment model (state, action, reward).
I hope you like this projects.