• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

星际争霸 II 中的大师级水平使用多智能体强化学习。

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

机构信息

DeepMind, London, UK.

Team Liquid, Utrecht, Netherlands.

出版信息

Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.

DOI:10.1038/s41586-019-1724-z
PMID:31666705
Abstract

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.

摘要

许多实际应用都需要人工智能代理在复杂环境中与其他代理竞争和协作。星际争霸领域成为人工智能研究的一个重要挑战,因为它在专业电子竞技中具有标志性和持久的地位,并且在原始复杂性和多代理挑战方面与现实世界相关。在十年的时间里和无数次的比赛中,最强的代理简化了游戏的重要方面,利用了超人的能力,或者采用了手工制作的子系统。尽管有这些优势,但之前没有一个代理能够接近顶级星际争霸选手的整体技能。我们选择使用通用学习方法来解决星际争霸的挑战,这些方法原则上适用于其他复杂领域:一种多代理强化学习算法,它在不断适应策略和反策略的多样化联盟中使用来自人类和代理游戏的数据,每个策略和反策略都由深度神经网络表示。我们通过一系列与人类玩家的在线游戏来评估我们的代理 AlphaStar 在星际争霸 II 中的表现。AlphaStar 在星际争霸的三个种族中都被评为大师级别,并且超过了 99.8%的官方排名人类玩家。

相似文献

1
Grandmaster level in StarCraft II using multi-agent reinforcement learning.星际争霸 II 中的大师级水平使用多智能体强化学习。
Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.
2
Difference in gaze control ability between low and high skill players of a real-time strategy game in esports.电子竞技中实时战略游戏低、高技能玩家注视控制能力的差异。
PLoS One. 2022 Mar 18;17(3):e0265526. doi: 10.1371/journal.pone.0265526. eCollection 2022.
3
Mastering the game of Stratego with model-free multiagent reinforcement learning.运用无模型多智能体强化学习掌握 Stratego 游戏。
Science. 2022 Dec 2;378(6623):990-996. doi: 10.1126/science.add4679. Epub 2022 Dec 1.
4
Artificial Intelligence Accidentally Learned Ecology through Video Games.人工智能通过电子游戏意外地学会了生态学。
Trends Ecol Evol. 2020 Jul;35(7):557-560. doi: 10.1016/j.tree.2020.04.006. Epub 2020 May 7.
5
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
6
Human-level performance in 3D multiplayer games with population-based reinforcement learning.基于群体强化学习的 3D 多人游戏中的人类水平表现。
Science. 2019 May 31;364(6443):859-865. doi: 10.1126/science.aau6249.
7
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.一种通过自我对弈掌握国际象棋、将棋和围棋的通用强化学习算法。
Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.
8
Human-level play in the game of by combining language models with strategic reasoning.通过将语言模型与策略推理相结合,在游戏中实现人类级别的表现。
Science. 2022 Dec 9;378(6624):1067-1074. doi: 10.1126/science.ade9097. Epub 2022 Nov 22.
9
Toward a Psychology of Deep Reinforcement Learning Agents Using a Cognitive Architecture.使用认知架构实现深度强化学习代理的心理学研究。
Top Cogn Sci. 2022 Oct;14(4):756-779. doi: 10.1111/tops.12573. Epub 2021 Sep 1.
10
Comparison of brain activation in response to two dimensional and three dimensional on-line games.二维和三维在线游戏引发的大脑激活比较。
Psychiatry Investig. 2013 Jun;10(2):115-20. doi: 10.4306/pi.2013.10.2.115. Epub 2013 May 30.

引用本文的文献

1
Generating synthetic multidimensional molecular time series data for machine learning: considerations.为机器学习生成合成多维分子时间序列数据:注意事项。
Front Syst Biol. 2023 Jul 25;3:1188009. doi: 10.3389/fsysb.2023.1188009. eCollection 2023.
2
Int-HRL: towards intention-based hierarchical reinforcement learning.基于意图的分层强化学习:Int-HRL
Neural Comput Appl. 2025;37(23):18823-18834. doi: 10.1007/s00521-024-10596-2. Epub 2024 Dec 11.
3
Data-driven equation discovery reveals nonlinear reinforcement learning in humans.
数据驱动的方程发现揭示了人类的非线性强化学习。
Proc Natl Acad Sci U S A. 2025 Aug 5;122(31):e2413441122. doi: 10.1073/pnas.2413441122. Epub 2025 Jul 31.
4
A scalable reinforcement learning framework inspired by hippocampal memory mechanisms for efficient contextual and sequential decision making.一种受海马体记忆机制启发的可扩展强化学习框架,用于高效的情境和序列决策。
Sci Rep. 2025 Jul 12;15(1):25221. doi: 10.1038/s41598-025-10586-x.
5
A multimodal deep reinforcement learning approach for IoT-driven adaptive scheduling and robustness optimization in global logistics networks.一种用于全球物流网络中物联网驱动的自适应调度和鲁棒性优化的多模态深度强化学习方法。
Sci Rep. 2025 Jul 12;15(1):25195. doi: 10.1038/s41598-025-10512-1.
6
On the construction of artificial general intelligence based on the correspondence between goals and means.基于目标与手段对应关系构建通用人工智能
Front Artif Intell. 2025 Jun 18;8:1588726. doi: 10.3389/frai.2025.1588726. eCollection 2025.
7
Dimensions underlying the representational alignment of deep neural networks with humans.深度神经网络与人类表征对齐背后的维度。
Nat Mach Intell. 2025;7(6):848-859. doi: 10.1038/s42256-025-01041-7. Epub 2025 Jun 23.
8
Perceptual interventions ameliorate statistical discrimination in learning agents.感知干预可改善学习智能体中的统计歧视。
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319933121. doi: 10.1073/pnas.2319933121. Epub 2025 Jun 16.
9
Deep mechanism design: Learning social and economic policies for human benefit.深度机制设计:学习造福人类的社会和经济政策。
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319949121. doi: 10.1073/pnas.2319949121. Epub 2025 Jun 16.
10
Picking strategies in games of cooperation.合作博弈中的选择策略。
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319925121. doi: 10.1073/pnas.2319925121. Epub 2025 Jun 16.