用深度神经网络和树搜索掌握围棋游戏。

Mastering the game of Go with deep neural networks and tree search.

机构信息

Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.

Google, 1600 Amphitheatre Parkway, Mountain View, California 94043, USA.

出版信息

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

DOI:10.1038/nature16961

PMID:26819042

Abstract

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

摘要

围棋一直被视为人工智能领域最具挑战性的经典游戏之一，因为其搜索空间巨大，而且很难评估棋盘位置和走法。在这里，我们引入了一种新的围棋计算机程序，它使用“价值网络”来评估棋盘位置，使用“策略网络”来选择走法。这些深度神经网络是通过结合人类专家游戏的监督学习和自我对弈的强化学习进行训练的。无需任何展望搜索，神经网络就可以达到模拟数千次自我对弈的最新蒙特卡洛树搜索程序的水平。我们还引入了一种新的搜索算法，将蒙特卡洛模拟与价值和策略网络相结合。使用这种搜索算法，我们的程序 AlphaGo 对其他围棋程序的胜率达到了 99.8%，并以 5 比 0 的比分击败了欧洲围棋冠军。这是计算机程序首次在完整的围棋比赛中击败人类职业选手，此前人们认为至少需要十年时间才能实现这一壮举。

相似文献

Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。

Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.

Google AI algorithm masters ancient game of Go.谷歌人工智能算法精通古老的围棋游戏。

Nature. 2016 Jan 28;529(7587):445-6. doi: 10.1038/529445a.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.一种通过自我对弈掌握国际象棋、将棋和围棋的通用强化学习算法。

Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.

Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data.利用深度神经网络和人类数据，学习玩国际象棋变体“疯狂之家”并超越世界冠军水平。

Front Artif Intell. 2020 Apr 28;3:24. doi: 10.3389/frai.2020.00024. eCollection 2020.

Evolutionary swarm neural network game engine for Capture Go.用于捕捉围棋的进化群体神经网络博弈引擎。

Neural Netw. 2010 Mar;23(2):295-305. doi: 10.1016/j.neunet.2009.11.001. Epub 2009 Nov 20.

Learning to play Go using recursive neural networks.使用递归神经网络学习下围棋。

Neural Netw. 2008 Nov;21(9):1392-400. doi: 10.1016/j.neunet.2008.02.002. Epub 2008 Mar 4.

AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner.AlphaDDA：将完全训练好的AlphaZero系统的游戏强度调整到适合人类训练伙伴的策略。

PeerJ Comput Sci. 2022 Oct 4;8:e1123. doi: 10.7717/peerj-cs.1123. eCollection 2022.

[Deep Learning and AlphaGo].[深度学习与阿尔法围棋]

Brain Nerve. 2019 Jul;71(7):681-694. doi: 10.11477/mf.1416201340.

Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。

Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.

引用本文的文献

Probing for consciousness in machines.探寻机器中的意识。

Front Artif Intell. 2025 Aug 20;8:1610225. doi: 10.3389/frai.2025.1610225. eCollection 2025.

Photonics and microwaves merge to improve computing flexibility.光子学与微波技术融合，以提高计算灵活性。

Light Sci Appl. 2025 Sep 4;14(1):303. doi: 10.1038/s41377-025-01933-8.

Toward the Uniform of Chemical Theory, Simulation, and Experiments in Metaverse Technology.迈向元宇宙技术中化学理论、模拟与实验的统一。

Precis Chem. 2023 Jun 14;1(4):192-198. doi: 10.1021/prechem.3c00045. eCollection 2023 Jun 26.

Machine learning for estimation and control of quantum systems.用于量子系统估计与控制的机器学习。

Natl Sci Rev. 2025 Jul 7;12(8):nwaf269. doi: 10.1093/nsr/nwaf269. eCollection 2025 Aug.

A framework for robotic manipulation tasks based on multiple zero shot models.基于多个零样本模型的机器人操作任务框架。

Sci Rep. 2025 Aug 24;15(1):31141. doi: 10.1038/s41598-025-17015-z.

AlphaFold 3: an unprecedent opportunity for fundamental research and drug development.阿尔法折叠3：基础研究和药物开发的前所未有的机遇。

Precis Clin Med. 2025 Jul 1;8(3):pbaf015. doi: 10.1093/pcmedi/pbaf015. eCollection 2025 Sep.

Data-driven equation discovery reveals nonlinear reinforcement learning in humans.数据驱动的方程发现揭示了人类的非线性强化学习。

Proc Natl Acad Sci U S A. 2025 Aug 5;122(31):e2413441122. doi: 10.1073/pnas.2413441122. Epub 2025 Jul 31.

Artificial Intelligence in Thoracic Surgery: Transforming Diagnostics, Treatment, and Patient Outcomes.胸外科中的人工智能：变革诊断、治疗及患者预后

Diagnostics (Basel). 2025 Jul 8;15(14):1734. doi: 10.3390/diagnostics15141734.

Integrated biotechnological and AI innovations for crop improvement.用于作物改良的综合生物技术与人工智能创新。

Nature. 2025 Jul;643(8073):925-937. doi: 10.1038/s41586-025-09122-8. Epub 2025 Jul 23.

Multi-fidelity neural network-based prediction of tensile strength of high-entropy alloy (FeNiCoCrCu) using molecular dynamics data.基于多保真度神经网络，利用分子动力学数据预测高熵合金（FeNiCoCrCu）的拉伸强度

J Mol Model. 2025 Jul 22;31(8):214. doi: 10.1007/s00894-025-06439-z.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用深度神经网络和树搜索掌握围棋游戏。

Mastering the game of Go with deep neural networks and tree search.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献