Introduced a novel algorithm based solely on reinforcement learning for the game of Go, without requiring human data, guidance, or domain knowledge beyond game rules.
Developed AlphaGo Zero to learn tabula rasa, acting as its own teacher by training a neural network to predict its own move selections and game outcomes.
The neural network architecture combined policy and value networks into a single system, utilizing residual blocks of convolutional layers, batch normalisation, and rectifier non-linearities.
Trained the system using a reinforcement learning algorithm with self-play, where a Monte-Carlo Tree Search (MCTS), guided by the neural network, generated improved move probabilities and game data for iterative network updates.
The MCTS stored prior probabilities, visit counts, and action-values, with simulations selecting moves that maximized an upper confidence bound, and leaf nodes evaluated by the neural network.
Neural network parameters were updated to minimize error between predicted values and self-play outcomes, and to maximize similarity between network move probabilities and MCTS search probabilities, using a loss function: l = (z − v)^2 − π^T log p + c||θ||^2.
An initial training instance (20 residual blocks) ran for approximately 3 days, generating 4.9 million self-play games (1,600 MCTS simulations per move), achieving superhuman performance and defeating AlphaGo Lee 100-0 using a single machine with 4 TPUs.
A second, larger instance (40 residual blocks) trained for approximately 40 days, generating 29 million self-play games, achieving an Elo rating of 5,185 and defeating AlphaGo Master 89-11.
Discovered extensive Go knowledge from first principles, including fundamental concepts (fuseki, tesuji, life-and-death, ko, yose) and novel strategies, surpassing traditional Go knowledge.
The system learned using only raw board history as input features and minimal domain knowledge: game rules, Tromp-Taylor scoring, 19x19 board structure, and symmetries (rotation, reflection, color transposition).
Key team contributions for the "Mastering the Game of Go without Human Knowledge" publication (Nature, October 2017) included: design and implementation of the reinforcement learning algorithm, MCTS search algorithm, and evaluation framework; project management and advisement; and authorship of the paper.
Discover other experts with similar qualifications and experience