David S. - Research and Development of AlphaGo Zero

Go to Website

Chemnitz, United Kingdom

Experience

Jan 2017 - Oct 2017

10 months

London, United Kingdom

Research and Development of AlphaGo Zero

DeepMind

Introduced a novel algorithm based solely on reinforcement learning for the game of Go, without requiring human data, guidance, or domain knowledge beyond game rules.
Developed AlphaGo Zero to learn tabula rasa, acting as its own teacher by training a neural network to predict its own move selections and game outcomes.
The neural network architecture combined policy and value networks into a single system, utilizing residual blocks of convolutional layers, batch normalisation, and rectifier non-linearities.
Trained the system using a reinforcement learning algorithm with self-play, where a Monte-Carlo Tree Search (MCTS), guided by the neural network, generated improved move probabilities and game data for iterative network updates.
The MCTS stored prior probabilities, visit counts, and action-values, with simulations selecting moves that maximized an upper confidence bound, and leaf nodes evaluated by the neural network.
Neural network parameters were updated to minimize error between predicted values and self-play outcomes, and to maximize similarity between network move probabilities and MCTS search probabilities, using a loss function: l = (z − v)^2 − π^T log p + c||θ||^2.
An initial training instance (20 residual blocks) ran for approximately 3 days, generating 4.9 million self-play games (1,600 MCTS simulations per move), achieving superhuman performance and defeating AlphaGo Lee 100-0 using a single machine with 4 TPUs.
A second, larger instance (40 residual blocks) trained for approximately 40 days, generating 29 million self-play games, achieving an Elo rating of 5,185 and defeating AlphaGo Master 89-11.
Discovered extensive Go knowledge from first principles, including fundamental concepts (fuseki, tesuji, life-and-death, ko, yose) and novel strategies, surpassing traditional Go knowledge.
The system learned using only raw board history as input features and minimal domain knowledge: game rules, Tromp-Taylor scoring, 19x19 board structure, and symmetries (rotation, reflection, color transposition).
Key team contributions for the "Mastering the Game of Go without Human Knowledge" publication (Nature, October 2017) included: design and implementation of the reinforcement learning algorithm, MCTS search algorithm, and evaluation framework; project management and advisement; and authorship of the paper.

Sep 2016 - Jan 2017

5 months

London, United Kingdom

Research and Development of AlphaGo Master

DeepMind

Developed AlphaGo Master, a program that defeated top human professional Go players 60–0 in online games in January 2017.
Utilized the same neural network architecture, reinforcement learning algorithm, and MCTS algorithm as AlphaGo Zero.
Differed from AlphaGo Zero by incorporating handcrafted features and rollouts derived from AlphaGo Lee.
Training was initialized using supervised learning from human game data.
Operated on a single machine with 4 TPUs during evaluation games.

Nov 2015 - Mar 2016

5 months

London, United Kingdom

Research and Development of AlphaGo Lee

DeepMind

Developed AlphaGo Lee, the program that defeated 18-time world champion Lee Sedol 4–1 in March 2016.
Based on a similar architecture to AlphaGo Fan, with significant enhancements.
The value network was trained using outcomes from fast self-play games generated by AlphaGo, with an iterated training procedure representing an early step towards tabula rasa learning.
Featured larger policy and value networks compared to AlphaGo Fan (12 convolutional layers with 256 planes each) and underwent more extensive training.
Operated as a distributed system utilizing 48 TPUs for faster neural network evaluations during search.

Jan 2015 - Oct 2015

10 months

London, United Kingdom

Research and Development of AlphaGo Fan

DeepMind

Developed AlphaGo Fan, the program that defeated European Go champion Fan Hui in October 2015 (results published in Nature, 2016).
Employed two deep neural networks: a policy network to predict move probabilities and a value network to evaluate board positions.
The policy network was initially trained via supervised learning on human expert moves, then refined using policy-gradient reinforcement learning.
The value network was trained to predict game winners from games played by the policy network against itself.
Combined these neural networks with a Monte-Carlo Tree Search (MCTS) algorithm for lookahead search.
The MCTS utilized the policy network to narrow the search space to high-probability moves and the value network (along with Monte-Carlo rollouts with a fast rollout policy) to evaluate positions within the search tree.
Operated as a distributed system across many machines, utilizing 176 GPUs.

Languages

English

Native

Chinese

Advanced

Education

Oct 2014 - Jun 2015

Imperial College London

Master's, Using Deep Reinforcement Learning to Play Chess · London, United Kingdom

Sep 2004 - Jun 2009

University of Alberta

Reinforcement Learning and Simulation-Based Search in Computer Go · Edmonton, Canada

Need a freelancer? Find your match in seconds.

Try FRATCH GPT

Similar Freelancers

Discover other experts with similar qualifications and experience

Ursula M.

Experience

Research and Development of AlphaGo Zero

DeepMind

Research and Development of AlphaGo Master

DeepMind

Research and Development of AlphaGo Lee

DeepMind

Research and Development of AlphaGo Fan

DeepMind

Languages

Education

Imperial College London

Master's, Using Deep Reinforcement Learning to Play Chess · London, United Kingdom

University of Alberta

Reinforcement Learning and Simulation-Based Search in Computer Go · Edmonton, Canada

Similar Freelancers

Data Scientist & AI Engineer & AI Architect

Machine Learning Engineer

Machine status detection in industrial 3D printing based on infrared image data

NLP Engineer

Consultant, Advisor and Partner

Product Owner & Senior Data Scientist

Aufbau eines modernen Gehaltssystems

AR/VR/XR Architect

Senior Data/ML Consultant & Technical Lead

Data Science Expert and AI Strategist

Tech Lead Customer Base Documentation Automation

Multi-chain LLM Copilot for Academic Teaching and Studying

ASVP Project

Hiring Requirement

Freelancer in Data Science