DHS (3) [Avatar] Offline
#1
Will this book also cover algorithms and techniques used in the development of AlphaGo Zero?
KevinF (2) [Avatar] Offline
#2
Hi DHS,

Let me try to summarize the differences between AlphaGo Zero and the version that played Lee Sedol. (I have no affiliation with DeepMind and I’m writing this from memory, so please forgive me if I get some of the details wrong)

1. AGZ used a particular neural network architecture called a residual network. We do not currently plan to cover residual networks in the book. We will cover deep neural network basics, convolutional layers, and some practical network design considerations.

2. AGLee trained the policy and value networks in separate processes; AGZ trained them together in a single process (and used one network with two outputs). We will cover both methods. In chapter 9 we cover training a policy network through reinforcement learning, in chapter 10 we show how to train a value network, and in chapter 11 we show how to do both at once (the actor-critic method).

3. AGLee used feature planes to represent certain go-specific properties, such as the number of liberties a stone has; AGZ used simple feature planes that just indicated where stones are on the board. We do cover how to use feature planes. This is a generally useful technique that can help out in other domains. Of course, you are free to encode the board in a minimal AGZ-style and run your own experiments that way. As DeepMind proved, the feature planes are not strictly necessary!

4. AGZ included a tree search component in the self-play games, which slightly changes the target of the policy network. You might say that for AGLee, they trained policy and value networks independently, and then integrated them with Monte Carlo Tree Search; whereas with AGZ, they trained the two networks specifically to be a part of the tree search process. This is a very cool innovation! I guess we haven’t decided whether to cover this technique yet, or in how much detail. If there is a lot of interest in this particular aspect, we can get into it. We definitely will cover the Monte Carlo Tree Search algorithm in general, and how to integrate it with deep learning.

Hope that answers your question!

Kevin
194758 (5) [Avatar] Offline
#3
Yes, please cover!
DHS (3) [Avatar] Offline
#4
Thanks for the reply Kevin.

I guess I hope that this book will cover the AGZ techniques and algorithms techniques in as much depth as possible so that we can apply them to other problem domains. So I hope the innovation of training the two networks specifically to be a part of the tree search process would be covered in enough detail.

I'm really looking forward to this book. Thanks!


KevinF wrote:Hi DHS,

Let me try to summarize the differences between AlphaGo Zero and the version that played Lee Sedol. (I have no affiliation with DeepMind and I’m writing this from memory, so please forgive me if I get some of the details wrong)

1. AGZ used a particular neural network architecture called a residual network. We do not currently plan to cover residual networks in the book. We will cover deep neural network basics, convolutional layers, and some practical network design considerations.

2. AGLee trained the policy and value networks in separate processes; AGZ trained them together in a single process (and used one network with two outputs). We will cover both methods. In chapter 9 we cover training a policy network through reinforcement learning, in chapter 10 we show how to train a value network, and in chapter 11 we show how to do both at once (the actor-critic method).

3. AGLee used feature planes to represent certain go-specific properties, such as the number of liberties a stone has; AGZ used simple feature planes that just indicated where stones are on the board. We do cover how to use feature planes. This is a generally useful technique that can help out in other domains. Of course, you are free to encode the board in a minimal AGZ-style and run your own experiments that way. As DeepMind proved, the feature planes are not strictly necessary!

4. AGZ included a tree search component in the self-play games, which slightly changes the target of the policy network. You might say that for AGLee, they trained policy and value networks independently, and then integrated them with Monte Carlo Tree Search; whereas with AGZ, they trained the two networks specifically to be a part of the tree search process. This is a very cool innovation! I guess we haven’t decided whether to cover this technique yet, or in how much detail. If there is a lot of interest in this particular aspect, we can get into it. We definitely will cover the Monte Carlo Tree Search algorithm in general, and how to integrate it with deep learning.

Hope that answers your question!

Kevin
NaN (1) [Avatar] Offline
#5
Also hope that particularly AlphaZero (the generalized variant of AlphaGo Zero that can easily be adapted to a broad field of games) will be covered in a final "ice topping" chapter. I assume many people primarily want to understand how the top notch Go engines work to apply similar concepts to other games and applications - this chapter could be very helpful for this final step!

That said - seeing a Go engine transition from borderline stupid to pretty good is of course highly fascinating and fun by itself smilie