运用无模型多智能体强化学习掌握 Stratego 游戏。

Mastering the game of Stratego with model-free multiagent reinforcement learning.

机构信息

DeepMind Technologies Ltd., London, UK.

出版信息

Science. 2022 Dec 2;378(6623):990-996. doi: 10.1126/science.add4679. Epub 2022 Dec 1.

Abstract

We introduce DeepNash, an autonomous agent that plays the imperfect information game Stratego at a human expert level. Stratego is one of the few iconic board games that artificial intelligence (AI) has not yet mastered. It is a game characterized by a twin challenge: It requires long-term strategic thinking as in chess, but it also requires dealing with imperfect information as in poker. The technique underpinning DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego through self-play from scratch. DeepNash beat existing state-of-the-art AI methods in Stratego and achieved a year-to-date (2022) and all-time top-three ranking on the Gravon games platform, competing with human expert players.

摘要

我们介绍 DeepNash，这是一个能够达到人类专家水平的自主代理，能够玩信息不完全的 Stratego 游戏。Stratego 是人工智能（AI）尚未掌握的少数标志性棋盘游戏之一。这是一种具有双重挑战的游戏：它需要像国际象棋一样的长期战略思维，但也需要像扑克一样处理信息不完全的情况。DeepNash 所使用的技术是一种基于博弈论的、无搜索的深度强化学习方法，通过自我博弈从零开始学习掌握 Stratego。DeepNash 在 Stratego 中击败了现有的最先进的 AI 方法，并在 Gravon 游戏平台上获得了截至 2022 年的年度和历史排名前三的成绩，与人类专家玩家竞争。

相似文献

Mastering the game of Stratego with model-free multiagent reinforcement learning.运用无模型多智能体强化学习掌握 Stratego 游戏。

Science. 2022 Dec 2;378(6623):990-996. doi: 10.1126/science.add4679. Epub 2022 Dec 1.

Grandmaster level in StarCraft II using multi-agent reinforcement learning.星际争霸 II 中的大师级水平使用多智能体强化学习。

Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.

Student of Games: A unified learning algorithm for both perfect and imperfect information games.博弈学习者：一种适用于完全信息博弈和不完全信息博弈的统一学习算法。

Sci Adv. 2023 Nov 17;9(46):eadg3256. doi: 10.1126/sciadv.adg3256. Epub 2023 Nov 15.

AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong.AlphaZe∗∗：用于不完全信息博弈的类似AlphaZero的基线方法出奇地强大。

Front Artif Intell. 2023 May 12;6:1014561. doi: 10.3389/frai.2023.1014561. eCollection 2023.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.一种通过自我对弈掌握国际象棋、将棋和围棋的通用强化学习算法。

Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.

Human-level play in the game of by combining language models with strategic reasoning.通过将语言模型与策略推理相结合，在游戏中实现人类级别的表现。

Science. 2022 Dec 9;378(6624):1067-1074. doi: 10.1126/science.ade9097. Epub 2022 Nov 22.

Human-level performance in 3D multiplayer games with population-based reinforcement learning.基于群体强化学习的 3D 多人游戏中的人类水平表现。

Science. 2019 May 31;364(6443):859-865. doi: 10.1126/science.aau6249.

Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.超级人工智能在单挑无限注德州扑克中击败顶级职业选手：Libratus 胜出。

Science. 2018 Jan 26;359(6374):418-424. doi: 10.1126/science.aao1733. Epub 2017 Dec 17.

DeepStack: Expert-level artificial intelligence in heads-up no-limit poker.深筹码：单人无限注德州扑克中的专家级人工智能。

Science. 2017 May 5;356(6337):508-513. doi: 10.1126/science.aam6960. Epub 2017 Mar 2.

引用本文的文献

Generating synthetic multidimensional molecular time series data for machine learning: considerations.为机器学习生成合成多维分子时间序列数据：注意事项。

Front Syst Biol. 2023 Jul 25;3:1188009. doi: 10.3389/fsysb.2023.1188009. eCollection 2023.

Picking strategies in games of cooperation.合作博弈中的选择策略。

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319925121. doi: 10.1073/pnas.2319925121. Epub 2025 Jun 16.

Clustering-based Failed goal Aware Hindsight Experience Replay.基于聚类的失败目标感知事后经验回放

PeerJ Comput Sci. 2024 Dec 12;10:e2588. doi: 10.7717/peerj-cs.2588. eCollection 2024.

Motion Hologram: Jointly optimized hologram generation and motion planning for photorealistic 3D displays via reinforcement learning.动态全息图：通过强化学习实现用于逼真3D显示的联合优化全息图生成与运动规划。

Sci Adv. 2025 Jan 31;11(5):eads9876. doi: 10.1126/sciadv.ads9876. Epub 2025 Jan 29.

Reinforcement Learning: A Paradigm Shift in Personalized Blood Glucose Management for Diabetes.强化学习：糖尿病个性化血糖管理的范式转变

Biomedicines. 2024 Sep 21;12(9):2143. doi: 10.3390/biomedicines12092143.

Student of Games: A unified learning algorithm for both perfect and imperfect information games.博弈学习者：一种适用于完全信息博弈和不完全信息博弈的统一学习算法。

Sci Adv. 2023 Nov 17;9(46):eadg3256. doi: 10.1126/sciadv.adg3256. Epub 2023 Nov 15.

MW-MADDPG: a meta-learning based decision-making method for collaborative UAV swarm.MW-MADDPG：一种基于元学习的协作无人机群决策方法。

Front Neurorobot. 2023 Sep 21;17:1243174. doi: 10.3389/fnbot.2023.1243174. eCollection 2023.

A Quantum-like Model of Interdependence for Embodied Human-Machine Teams: Reviewing the Path to Autonomy Facing Complexity and Uncertainty.具身人机团队的类量子相互依存模型：审视面对复杂性和不确定性的自主之路

Entropy (Basel). 2023 Sep 11;25(9):1323. doi: 10.3390/e25091323.

AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong.AlphaZe∗∗：用于不完全信息博弈的类似AlphaZero的基线方法出奇地强大。

Front Artif Intell. 2023 May 12;6:1014561. doi: 10.3389/frai.2023.1014561. eCollection 2023.

Lessons from natural flight for aviation: then, now and tomorrow.从自然飞行中汲取航空灵感：过去、现在与未来。

J Exp Biol. 2023 Apr 25;226(Suppl_1). doi: 10.1242/jeb.245409. Epub 2023 Apr 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

运用无模型多智能体强化学习掌握 Stratego 游戏。

Mastering the game of Stratego with model-free multiagent reinforcement learning.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献