• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过深度强化学习学习星际争霸中的宏观操作。

Learning Macromanagement in Starcraft by Deep Reinforcement Learning.

机构信息

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.

CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

Sensors (Basel). 2021 May 11;21(10):3332. doi: 10.3390/s21103332.

DOI:10.3390/s21103332
PMID:34065012
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8150573/
Abstract

StarCraft is a real-time strategy game that provides a complex environment for AI research. Macromanagement, i.e., selecting appropriate units to build depending on the current state, is one of the most important problems in this game. To reduce the requirements for expert knowledge and enhance the coordination of the systematic bot, we select reinforcement learning (RL) to tackle the problem of macromanagement. We propose a novel deep RL method, Mean Asynchronous Advantage Actor-Critic (MA3C), which computes the approximate expected policy gradient instead of the gradient of sampled action to reduce the variance of the gradient, and encode the history queue with recurrent neural network to tackle the problem of imperfect information. The experimental results show that MA3C achieves a very high rate of winning, approximately 90%, against the weaker opponents and it improves the win rate about 30% against the stronger opponents. We also propose a novel method to visualize and interpret the policy learned by MA3C. Combined with the visualized results and the snapshots of games, we find that the learned macromanagement not only adapts to the game rules and the policy of the opponent bot, but also cooperates well with the other modules of MA3C-Bot.

摘要

《星际争霸》是一款实时战略游戏,为人工智能研究提供了一个复杂的环境。宏观操作,即根据当前状态选择合适的单位进行建造,是该游戏中最重要的问题之一。为了降低对专家知识的要求并增强系统机器人的协调性,我们选择强化学习(RL)来解决宏观操作问题。我们提出了一种新颖的深度 RL 方法,即平均异步优势演员评论家(MA3C),它计算近似期望策略梯度,而不是采样动作的梯度,以减少梯度的方差,并使用递归神经网络对历史队列进行编码,以解决信息不完美的问题。实验结果表明,MA3C 对较弱的对手的胜率非常高,约为 90%,对较强的对手的胜率提高了约 30%。我们还提出了一种新的方法来可视化和解释 MA3C 学习的策略。结合可视化结果和游戏的快照,我们发现学习到的宏观操作不仅适应了游戏规则和对手机器人的策略,而且与 MA3C-Bot 的其他模块配合得很好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/c239209491bd/sensors-21-03332-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/d2f8d1039034/sensors-21-03332-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/9ba4d0300eec/sensors-21-03332-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/bbf6befd7fc8/sensors-21-03332-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/9d990a465bb7/sensors-21-03332-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/4194ba7e1d41/sensors-21-03332-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/59f8d3f631f7/sensors-21-03332-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/c239209491bd/sensors-21-03332-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/d2f8d1039034/sensors-21-03332-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/9ba4d0300eec/sensors-21-03332-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/bbf6befd7fc8/sensors-21-03332-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/9d990a465bb7/sensors-21-03332-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/4194ba7e1d41/sensors-21-03332-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/59f8d3f631f7/sensors-21-03332-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9413/8150573/c239209491bd/sensors-21-03332-g007.jpg

相似文献

1
Learning Macromanagement in Starcraft by Deep Reinforcement Learning.通过深度强化学习学习星际争霸中的宏观操作。
Sensors (Basel). 2021 May 11;21(10):3332. doi: 10.3390/s21103332.
2
Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games.合作星际争霸游戏中的半集中式深度确定性策略梯度
IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1584-1593. doi: 10.1109/TNNLS.2020.3042943. Epub 2022 Apr 4.
3
Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning.通过演员-评论家强化学习实现多人扑克的最优策略
Entropy (Basel). 2022 May 30;24(6):774. doi: 10.3390/e24060774.
4
Stochastic Integrated Actor-Critic for Deep Reinforcement Learning.用于深度强化学习的随机集成演员-评论家算法
IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6654-6666. doi: 10.1109/TNNLS.2022.3212273. Epub 2024 May 2.
5
Learning to play against any mixture of opponents.学会与任何对手组合进行对抗。
Front Artif Intell. 2023 Jul 20;6:804682. doi: 10.3389/frai.2023.804682. eCollection 2023.
6
Multi-agent reinforcement learning with approximate model learning for competitive games.多智能体强化学习与近似模型学习在竞争性游戏中的应用。
PLoS One. 2019 Sep 11;14(9):e0222215. doi: 10.1371/journal.pone.0222215. eCollection 2019.
7
3-Dimensional convolutional neural networks for predicting StarCraft Ⅱ results and extracting key game situations.用于预测《星际争霸Ⅱ》比赛结果和提取关键游戏情况的三维卷积神经网络。
PLoS One. 2022 Mar 3;17(3):e0264550. doi: 10.1371/journal.pone.0264550. eCollection 2022.
8
Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.重复囚徒困境中的多智能体强化学习
Biosystems. 1996;37(1-2):147-66. doi: 10.1016/0303-2647(95)01551-5.
9
Behavior fusion for deep reinforcement learning.深度强化学习中的行为融合
ISA Trans. 2020 Mar;98:434-444. doi: 10.1016/j.isatra.2019.08.054. Epub 2019 Sep 17.
10
Gradient Monitored Reinforcement Learning.梯度监控强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4106-4119. doi: 10.1109/TNNLS.2021.3119853. Epub 2023 Aug 4.

本文引用的文献

1
Grandmaster level in StarCraft II using multi-agent reinforcement learning.星际争霸 II 中的大师级水平使用多智能体强化学习。
Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.
2
Deep Learning for Computer Vision: A Brief Review.深度学习在计算机视觉中的应用综述
Comput Intell Neurosci. 2018 Feb 1;2018:7068349. doi: 10.1155/2018/7068349. eCollection 2018.
3
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
4
Machine learning: Trends, perspectives, and prospects.机器学习:趋势、观点和展望。
Science. 2015 Jul 17;349(6245):255-60. doi: 10.1126/science.aaa8415.
5
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
6
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.