• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多体验辅助的高效多智能体强化学习

Multiexperience-Assisted Efficient Multiagent Reinforcement Learning.

作者信息

Zhang Tianle, Liu Zhen, Yi Jianqiang, Wu Shiguang, Pu Zhiqiang, Zhao Yanjie

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12678-12692. doi: 10.1109/TNNLS.2023.3264275. Epub 2024 Sep 3.

DOI:10.1109/TNNLS.2023.3264275
PMID:37037246
Abstract

Recently, multiagent reinforcement learning (MARL) has shown great potential for learning cooperative policies in multiagent systems (MASs). However, a noticeable drawback of current MARL is the low sample efficiency, which causes a huge amount of interactions with environment. Such amount of interactions greatly hinders the real-world application of MARL. Fortunately, effectively incorporating experience knowledge can assist MARL to quickly find effective solutions, which can significantly alleviate the drawback. In this article, a novel multiexperience-assisted reinforcement learning (MEARL) method is proposed to improve the learning efficiency of MASs. Specifically, monotonicity-constrained reward shaping is innovatively designed using expert experience to provide additional individual rewards to guide multiagent learning efficiently, with the invariance guarantee of the team optimization objective. Furthermore, a reward distribution estimator is specially developed to model an implicated reward distribution of environment by using transition experience from environment, containing collected samples (state-action pair, reward, and next state). This estimator can predict the expectation reward of each agent for the taken action to accurately estimate the state value function and accelerate its convergence. Besides, the performance of MEARL is evaluated on two multiagent environment platforms: our designed unmanned aerial vehicle combat (UAV-C) and StarCraft II Micromanagement (SCII-M). Simulation results demonstrate that the proposed MEARL can greatly improve the learning efficiency and performance of MASs and is superior to the state-of-the-art methods in multiagent tasks.

摘要

最近,多智能体强化学习(MARL)在多智能体系统(MAS)中学习合作策略方面展现出了巨大潜力。然而,当前MARL一个明显的缺点是样本效率低,这导致与环境进行大量交互。如此大量的交互极大地阻碍了MARL在现实世界中的应用。幸运的是,有效整合经验知识可以帮助MARL快速找到有效解决方案,这可以显著缓解该缺点。在本文中,提出了一种新颖的多经验辅助强化学习(MEARL)方法来提高MAS的学习效率。具体而言,利用专家经验创新性地设计了单调性约束奖励塑造,以提供额外的个体奖励来有效指导多智能体学习,同时保证团队优化目标的不变性。此外,专门开发了一种奖励分布估计器,通过使用来自环境的转移经验(包含收集的样本:状态-动作对、奖励和下一个状态)对环境的隐含奖励分布进行建模。该估计器可以预测每个智能体对所采取动作的期望奖励,以准确估计状态值函数并加速其收敛。此外,在两个多智能体环境平台上评估了MEARL的性能:我们设计的无人机作战(UAV-C)和星际争霸II微观管理(SCII-M)。仿真结果表明,所提出的MEARL可以大大提高MAS的学习效率和性能,并且在多智能体任务中优于现有最先进的方法。

相似文献

1
Multiexperience-Assisted Efficient Multiagent Reinforcement Learning.多体验辅助的高效多智能体强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12678-12692. doi: 10.1109/TNNLS.2023.3264275. Epub 2024 Sep 3.
2
Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
3
UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios.联合国排雷行动处:非成形合作场景下的多智能体强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Apr;34(4):2093-2104. doi: 10.1109/TNNLS.2021.3105869. Epub 2023 Apr 4.
4
Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance.多智能体联合近端策略优化算法控制协同固定翼无人机避障研究
Sensors (Basel). 2020 Aug 13;20(16):4546. doi: 10.3390/s20164546.
5
SATF: A Scalable Attentive Transfer Framework for Efficient Multiagent Reinforcement Learning.SATF:一种用于高效多智能体强化学习的可扩展注意力转移框架
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6627-6641. doi: 10.1109/TNNLS.2024.3387397. Epub 2025 Apr 4.
6
Attentive Relational State Representation in Decentralized Multiagent Reinforcement Learning.分散式多智能体强化学习中的注意力关系状态表示
IEEE Trans Cybern. 2022 Jan;52(1):252-264. doi: 10.1109/TCYB.2020.2979803. Epub 2022 Jan 11.
7
Adaptive Individual Q-Learning-A Multiagent Reinforcement Learning Method for Coordination Optimization.自适应个体Q学习——一种用于协调优化的多智能体强化学习方法
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7739-7750. doi: 10.1109/TNNLS.2024.3385097. Epub 2025 Apr 4.
8
A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning.基于深度强化学习的多智能体合作的分布视角
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4246-4259. doi: 10.1109/TNNLS.2022.3202097. Epub 2024 Feb 29.
9
SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning.SMIX(λ):增强用于协作多智能体强化学习的集中式价值函数
IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):52-63. doi: 10.1109/TNNLS.2021.3089493. Epub 2023 Jan 5.
10
Lateral Transfer Learning for Multiagent Reinforcement Learning.多智能体强化学习的横向迁移学习。
IEEE Trans Cybern. 2023 Mar;53(3):1699-1711. doi: 10.1109/TCYB.2021.3108237. Epub 2023 Feb 15.