Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
Google, 1600 Amphitheatre Parkway, Mountain View, California 94043, USA.
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
围棋一直被视为人工智能领域最具挑战性的经典游戏之一,因为其搜索空间巨大,而且很难评估棋盘位置和走法。在这里,我们引入了一种新的围棋计算机程序,它使用“价值网络”来评估棋盘位置,使用“策略网络”来选择走法。这些深度神经网络是通过结合人类专家游戏的监督学习和自我对弈的强化学习进行训练的。无需任何展望搜索,神经网络就可以达到模拟数千次自我对弈的最新蒙特卡洛树搜索程序的水平。我们还引入了一种新的搜索算法,将蒙特卡洛模拟与价值和策略网络相结合。使用这种搜索算法,我们的程序 AlphaGo 对其他围棋程序的胜率达到了 99.8%,并以 5 比 0 的比分击败了欧洲围棋冠军。这是计算机程序首次在完整的围棋比赛中击败人类职业选手,此前人们认为至少需要十年时间才能实现这一壮举。
Nature. 2016-1-28
Nature. 2017-10-18
Nature. 2016-1-28
Front Artif Intell. 2020-4-28
Neural Netw. 2009-11-20
Neural Netw. 2008-11
Brain Nerve. 2019-7
Nature. 2015-2-26
Front Artif Intell. 2025-8-20
Light Sci Appl. 2025-9-4
Natl Sci Rev. 2025-7-7
Precis Clin Med. 2025-7-1
Proc Natl Acad Sci U S A. 2025-8-5
Diagnostics (Basel). 2025-7-8