• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于掩蔽重建的高效多智能体强化学习。

Sample-efficient multi-agent reinforcement learning with masked reconstruction.

机构信息

School of Industrial and Management Engineering, Korea University, Seoul, Republic of Korea.

出版信息

PLoS One. 2023 Sep 14;18(9):e0291545. doi: 10.1371/journal.pone.0291545. eCollection 2023.

DOI:10.1371/journal.pone.0291545
PMID:37708154
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10501567/
Abstract

Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training times and large amounts of data to learn optimal policies. These limitations are more pronounced in the context of multi-agent reinforcement learning (MARL). To address these limitations, various studies have been conducted to improve DRL. In this study, we propose an approach that combines a masked reconstruction task with QMIX (M-QMIX). By introducing a masked reconstruction task as an auxiliary task, we aim to achieve enhanced sample efficiency-a fundamental limitation of RL in multi-agent systems. Experiments were conducted using the StarCraft II micromanagement benchmark to validate the effectiveness of the proposed method. We used 11 scenarios comprising five easy, three hard, and three very hard scenarios. We particularly focused on using a limited number of time steps for each scenario to demonstrate the improved sample efficiency. Compared to QMIX, the proposed method is superior in eight of the 11 scenarios. These results provide strong evidence that the proposed method is more sample-efficient than QMIX, demonstrating that it effectively addresses the limitations of DRL in multi-agent systems.

摘要

深度强化学习(DRL)是一种强大的方法,它结合了强化学习(RL)和深度学习,以解决高维环境中的复杂决策问题。尽管 DRL 已经取得了显著的成功,但它的低样本效率需要大量的训练时间和数据来学习最优策略。这些限制在多智能体强化学习(MARL)的背景下更为明显。为了解决这些限制,已经进行了各种研究来改进 DRL。在本研究中,我们提出了一种结合掩蔽重建任务和 QMIX(M-QMIX)的方法。通过引入掩蔽重建任务作为辅助任务,我们旨在实现增强的样本效率——这是 RL 在多智能体系统中的一个基本限制。使用星际争霸 II 微观管理基准进行实验,以验证所提出方法的有效性。我们使用了 11 个场景,其中包括 5 个简单、3 个困难和 3 个非常困难的场景。我们特别关注在每个场景中使用有限的时间步骤来演示提高的样本效率。与 QMIX 相比,所提出的方法在 11 个场景中的 8 个场景中表现更好。这些结果提供了强有力的证据表明,所提出的方法比 QMIX 更具样本效率,有效地解决了 DRL 在多智能体系统中的限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/56a24f665524/pone.0291545.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/7651d0cd48c4/pone.0291545.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/f6095c229a10/pone.0291545.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/fa0254c6756f/pone.0291545.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/6dcd6438c7ad/pone.0291545.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/6a290d8e5316/pone.0291545.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/88b073ea7a05/pone.0291545.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/56a24f665524/pone.0291545.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/7651d0cd48c4/pone.0291545.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/f6095c229a10/pone.0291545.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/fa0254c6756f/pone.0291545.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/6dcd6438c7ad/pone.0291545.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/6a290d8e5316/pone.0291545.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/88b073ea7a05/pone.0291545.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b1eb/10501567/56a24f665524/pone.0291545.g007.jpg

相似文献

1
Sample-efficient multi-agent reinforcement learning with masked reconstruction.基于掩蔽重建的高效多智能体强化学习。
PLoS One. 2023 Sep 14;18(9):e0291545. doi: 10.1371/journal.pone.0291545. eCollection 2023.
2
TIMAR: Transition-informed representation for sample-efficient multi-agent reinforcement learning.TIMAR:用于样本高效多智能体强化学习的转换感知表示
Neural Netw. 2025 Apr;184:107081. doi: 10.1016/j.neunet.2024.107081. Epub 2024 Dec 31.
3
Credit assignment with predictive contribution measurement in multi-agent reinforcement learning.多智能体强化学习中的信用分配与预测贡献度量。
Neural Netw. 2023 Jul;164:681-690. doi: 10.1016/j.neunet.2023.05.021. Epub 2023 May 20.
4
Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
5
Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning.技能很重要:多智能体合作强化学习中的动态技能学习
Neural Netw. 2025 Jan;181:106852. doi: 10.1016/j.neunet.2024.106852. Epub 2024 Nov 2.
6
STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari.STACoRe:用于雅达利强化学习的基于时空和动作对比的表示方法。
Neural Netw. 2023 Mar;160:1-11. doi: 10.1016/j.neunet.2022.12.018. Epub 2022 Dec 29.
7
Predictive hierarchical reinforcement learning for path-efficient mapless navigation with moving target.具有移动目标的无图路径高效导航的预测分层强化学习。
Neural Netw. 2023 Aug;165:677-688. doi: 10.1016/j.neunet.2023.06.007. Epub 2023 Jun 10.
8
A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.基于 Transformer 的混合在线非策略强化学习代理框架。
Int J Neural Syst. 2023 Dec;33(12):2350065. doi: 10.1142/S012906572350065X. Epub 2023 Oct 20.
9
A multi-agent reinforcement learning based approach for automatic filter pruning.一种基于多智能体强化学习的自动滤波器剪枝方法。
Sci Rep. 2024 Dec 28;14(1):31193. doi: 10.1038/s41598-024-82562-w.
10
Efficient Deep Reinforcement Learning With Imitative Expert Priors for Autonomous Driving.基于模仿专家先验的高效深度强化学习用于自动驾驶
IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7391-7403. doi: 10.1109/TNNLS.2022.3142822. Epub 2023 Oct 5.

引用本文的文献

1
A robot scheduling method based on rMAPPO for H-beam riveting and welding work cell.一种基于rMAPPO的H型钢铆焊工作单元机器人调度方法。
PLoS One. 2025 Sep 4;20(9):e0331515. doi: 10.1371/journal.pone.0331515. eCollection 2025.

本文引用的文献

1
Masked Contrastive Representation Learning for Reinforcement Learning.用于强化学习的掩码对比表示学习
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3421-3433. doi: 10.1109/TPAMI.2022.3176413. Epub 2023 Feb 3.
2
MSPM: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management.MSPM:一个基于模块化可扩展多智能体强化学习的金融投资组合管理系统。
PLoS One. 2022 Feb 18;17(2):e0263689. doi: 10.1371/journal.pone.0263689. eCollection 2022.
3
Multi-agent reinforcement learning with approximate model learning for competitive games.
多智能体强化学习与近似模型学习在竞争性游戏中的应用。
PLoS One. 2019 Sep 11;14(9):e0222215. doi: 10.1371/journal.pone.0222215. eCollection 2019.
4
Emergence of linguistic conventions in multi-agent reinforcement learning.多智能体强化学习中的语言规范的出现。
PLoS One. 2018 Nov 29;13(11):e0208095. doi: 10.1371/journal.pone.0208095. eCollection 2018.
5
Multiagent cooperation and competition with deep reinforcement learning.基于深度强化学习的多智能体合作与竞争
PLoS One. 2017 Apr 5;12(4):e0172395. doi: 10.1371/journal.pone.0172395. eCollection 2017.
6
Decentralized Opportunistic Spectrum Resources Access Model and Algorithm toward Cooperative Ad-Hoc Networks.面向协作自组织网络的分布式机会频谱资源接入模型与算法
PLoS One. 2016 Jan 4;11(1):e0145526. doi: 10.1371/journal.pone.0145526. eCollection 2016.
7
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.