David S.

Research and Development of AlphaGo Zero

Chemnitz, United Kingdom

Experience

Jan 2017 - Oct 2017
10 months
London, United Kingdom

Research and Development of AlphaGo Zero

DeepMind

  • Introduced a novel algorithm based solely on reinforcement learning for the game of Go, without requiring human data, guidance, or domain knowledge beyond game rules.

  • Developed AlphaGo Zero to learn tabula rasa, acting as its own teacher by training a neural network to predict its own move selections and game outcomes.

  • The neural network architecture combined policy and value networks into a single system, utilizing residual blocks of convolutional layers, batch normalisation, and rectifier non-linearities.

  • Trained the system using a reinforcement learning algorithm with self-play, where a Monte-Carlo Tree Search (MCTS), guided by the neural network, generated improved move probabilities and game data for iterative network updates.

  • The MCTS stored prior probabilities, visit counts, and action-values, with simulations selecting moves that maximized an upper confidence bound, and leaf nodes evaluated by the neural network.

  • Neural network parameters were updated to minimize error between predicted values and self-play outcomes, and to maximize similarity between network move probabilities and MCTS search probabilities, using a loss function: l = (z − v)^2 − π^T log p + c||θ||^2.

  • An initial training instance (20 residual blocks) ran for approximately 3 days, generating 4.9 million self-play games (1,600 MCTS simulations per move), achieving superhuman performance and defeating AlphaGo Lee 100-0 using a single machine with 4 TPUs.

  • A second, larger instance (40 residual blocks) trained for approximately 40 days, generating 29 million self-play games, achieving an Elo rating of 5,185 and defeating AlphaGo Master 89-11.

  • Discovered extensive Go knowledge from first principles, including fundamental concepts (fuseki, tesuji, life-and-death, ko, yose) and novel strategies, surpassing traditional Go knowledge.

  • The system learned using only raw board history as input features and minimal domain knowledge: game rules, Tromp-Taylor scoring, 19x19 board structure, and symmetries (rotation, reflection, color transposition).

  • Key team contributions for the "Mastering the Game of Go without Human Knowledge" publication (Nature, October 2017) included: design and implementation of the reinforcement learning algorithm, MCTS search algorithm, and evaluation framework; project management and advisement; and authorship of the paper.

Sep 2016 - Jan 2017
5 months
London, United Kingdom

Research and Development of AlphaGo Master

DeepMind

  • Developed AlphaGo Master, a program that defeated top human professional Go players 60–0 in online games in January 2017.
  • Utilized the same neural network architecture, reinforcement learning algorithm, and MCTS algorithm as AlphaGo Zero.
  • Differed from AlphaGo Zero by incorporating handcrafted features and rollouts derived from AlphaGo Lee.
  • Training was initialized using supervised learning from human game data.
  • Operated on a single machine with 4 TPUs during evaluation games.
Nov 2015 - Mar 2016
5 months
London, United Kingdom

Research and Development of AlphaGo Lee

DeepMind

  • Developed AlphaGo Lee, the program that defeated 18-time world champion Lee Sedol 4–1 in March 2016.
  • Based on a similar architecture to AlphaGo Fan, with significant enhancements.
  • The value network was trained using outcomes from fast self-play games generated by AlphaGo, with an iterated training procedure representing an early step towards tabula rasa learning.
  • Featured larger policy and value networks compared to AlphaGo Fan (12 convolutional layers with 256 planes each) and underwent more extensive training.
  • Operated as a distributed system utilizing 48 TPUs for faster neural network evaluations during search.
Jan 2015 - Oct 2015
10 months
London, United Kingdom

Research and Development of AlphaGo Fan

DeepMind

  • Developed AlphaGo Fan, the program that defeated European Go champion Fan Hui in October 2015 (results published in Nature, 2016).
  • Employed two deep neural networks: a policy network to predict move probabilities and a value network to evaluate board positions.
  • The policy network was initially trained via supervised learning on human expert moves, then refined using policy-gradient reinforcement learning.
  • The value network was trained to predict game winners from games played by the policy network against itself.
  • Combined these neural networks with a Monte-Carlo Tree Search (MCTS) algorithm for lookahead search.
  • The MCTS utilized the policy network to narrow the search space to high-probability moves and the value network (along with Monte-Carlo rollouts with a fast rollout policy) to evaluate positions within the search tree.
  • Operated as a distributed system across many machines, utilizing 176 GPUs.

Languages

English
Native
Chinese
Advanced

Education

Oct 2014 - Jun 2015

Imperial College London

Master's, Using Deep Reinforcement Learning to Play Chess · London, United Kingdom

Sep 2004 - Jun 2009

University of Alberta

Reinforcement Learning and Simulation-Based Search in Computer Go · Edmonton, Canada

Need a freelancer? Find your match in seconds.
Try FRATCH GPT
More actions

Similar Freelancers

Discover other experts with similar qualifications and experience

Ursula M.

Data Scientist & AI Engineer & AI Architect

View Profile
Sanjay J.

NLP Engineer

View Profile
Martin R.

Senior AI Consultant and Research Scientist

View Profile
Felix B.

Machine status detection in industrial 3D printing based on infrared image data

View Profile
Qin Z.

Research Intern

View Profile
Jens D.

Product Owner & Senior Data Scientist

View Profile
Jürgen F.

AR/VR/XR Architect

View Profile
Eduard V.

Tech Lead Customer Base Documentation Automation

View Profile
Johannes R.

Supervision of student thesis Adrian Bohnert; Blockchain and Smart Contracts

View Profile
Philipp G.

Building a Modern Compensation System

View Profile
Stephan S.

Senior Data/ML Consultant & Technical Lead

View Profile
Mirza K.

Senior Data Scientist

View Profile
Mark W.

Independent IT/AI Consultant

View Profile
Kevin M.

Freelance Data Scientist & Lecturer

View Profile
Nino S.

Freelancer in Data Science

View Profile
Maciej K.

Senior Backend Engineer & AI System Developer

View Profile
Meisam G.

Machine Learning Engineer

View Profile
Filipp T.

Multi-chain LLM Copilot for Academic Teaching and Studying

View Profile
Karl E.

Hiring Requirement

View Profile
Arne H.

Devops Fullstack Engineer

View Profile
Kaan D.

IT Consultant

View Profile
Philipp B.

Research for Master's Thesis

View Profile
Julien L.

MLOps Engineer

View Profile
Louis G.

Research Project

View Profile
Mathias W.

Full-Stack Data Scientist | AI Consultant | Tech Lead

View Profile
Niklas W.

AI Engineer, Cloud Solution Architect, Backend Developer

View Profile
David O.

IT Student Assistant

View Profile
Fidel G.

Engineer for Construction Robotics / Digital Construction Processes

View Profile
Marcos S.

Firmware/Hardware/Software Architect

View Profile
Diana P.

Research Scientist

View Profile