通过使用学习模型进行规划，掌握 Atari、围棋、国际象棋和将棋。

Mastering Atari, Go, chess and shogi by planning with a learned model.

机构信息

DeepMind, London, UK.

University College London, London, UK.

出版信息

Nature. 2020 Dec;588(7839):604-609. doi: 10.1038/s41586-020-03051-4. Epub 2020 Dec 23.

DOI:10.1038/s41586-020-03051-4

PMID:33361790

Abstract

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm that was supplied with the rules of the game.

摘要

构建具有规划能力的代理一直是人工智能追求的主要挑战之一。基于树的规划方法在具有挑战性的领域（如国际象棋和围棋）中取得了巨大的成功，这些领域都有一个完美的模拟器。然而，在现实世界的问题中，控制环境的动态往往是复杂和未知的。在这里，我们提出了 MuZero 算法，它通过将基于树的搜索与学习模型相结合，在一系列具有挑战性和视觉复杂的领域中实现了超人的性能，而无需任何对其底层动态的了解。MuZero 算法学习了一种可迭代的模型，该模型可以生成与规划相关的预测：动作选择策略、价值函数和奖励。当在 57 个不同的 Atari 游戏（用于测试人工智能技术的典型视频游戏环境，基于模型的规划方法在历史上一直难以解决）上进行评估时，MuZero 算法实现了最先进的性能。当在围棋、国际象棋和将棋（高性能规划的典型环境）上进行评估时，MuZero 算法无需任何对游戏动态的了解，就能匹配使用游戏规则的 AlphaZero 算法的超人性能。

相似文献

Mastering Atari, Go, chess and shogi by planning with a learned model.

Nature. 2020 Dec;588(7839):604-609. doi: 10.1038/s41586-020-03051-4. Epub 2020 Dec 23.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.

AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner.

PeerJ Comput Sci. 2022 Oct 4;8:e1123. doi: 10.7717/peerj-cs.1123. eCollection 2022.

The Impact of Artificial Intelligence on the Chess World.

JMIR Serious Games. 2020 Dec 10;8(4):e24049. doi: 10.2196/24049.

Mastering the game of Go without human knowledge.

Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.

Student of Games: A unified learning algorithm for both perfect and imperfect information games.

Sci Adv. 2023 Nov 17;9(46):eadg3256. doi: 10.1126/sciadv.adg3256. Epub 2023 Nov 15.

Mastering the game of Go with deep neural networks and tree search.

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

Artificial intelligence insights into hippocampal processing.

Front Comput Neurosci. 2022 Nov 7;16:1044659. doi: 10.3389/fncom.2022.1044659. eCollection 2022.

Mastering the game of Stratego with model-free multiagent reinforcement learning.

Science. 2022 Dec 2;378(6623):990-996. doi: 10.1126/science.add4679. Epub 2022 Dec 1.

Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data.

Front Artif Intell. 2020 Apr 28;3:24. doi: 10.3389/frai.2020.00024. eCollection 2020.

引用本文的文献

Probing for consciousness in machines.

Front Artif Intell. 2025 Aug 20;8:1610225. doi: 10.3389/frai.2025.1610225. eCollection 2025.

Toward the Uniform of Chemical Theory, Simulation, and Experiments in Metaverse Technology.

Precis Chem. 2023 Jun 14;1(4):192-198. doi: 10.1021/prechem.3c00045. eCollection 2023 Jun 26.

Shared worlds, shared minds : Strategies to develop physically and socially embedded AI.

EMBO Rep. 2025 Aug 20. doi: 10.1038/s44319-025-00549-8.

Advanced Design for High-Performance and AI Chips.

Nanomicro Lett. 2025 Jul 29;18(1):13. doi: 10.1007/s40820-025-01850-w.

Deep learning in next-generation vaccine development for infectious diseases.

Mol Ther Nucleic Acids. 2025 Jun 4;36(3):102586. doi: 10.1016/j.omtn.2025.102586. eCollection 2025 Sep 9.

Foundation models and intelligent decision-making: Progress, challenges, and perspectives.

Innovation (Camb). 2025 May 12;6(6):100948. doi: 10.1016/j.xinn.2025.100948. eCollection 2025 Jun 2.

Collective cooperative intelligence.

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319948121. doi: 10.1073/pnas.2319948121. Epub 2025 Jun 16.

MetaSeeker: sketching an open invisible space with self-play reinforcement learning.

Light Sci Appl. 2025 Jun 4;14(1):211. doi: 10.1038/s41377-025-01876-0.

FOCUS: object-centric world models for robotic manipulation.

Front Neurorobot. 2025 Apr 30;19:1585386. doi: 10.3389/fnbot.2025.1585386. eCollection 2025.

Simple modification of the upper confidence bound algorithm by generalized weighted averages.

PLoS One. 2025 May 7;20(5):e0322757. doi: 10.1371/journal.pone.0322757. eCollection 2025.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过使用学习模型进行规划，掌握 Atari、围棋、国际象棋和将棋。

Mastering Atari, Go, chess and shogi by planning with a learned model.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献