• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于改进多智能体强化学习的表征驱动采样与自适应策略重置

Representation-driven sampling and adaptive policy resetting for improving multi-Agent reinforcement learning.

作者信息

Jin Weiqiang, Tian Xingwu, Wang Ningwei, Wu Baohai, Shi Bohang, Zhao Biao, Yang Guang

机构信息

School of Information and Communications Engineering, Xi'an Jiaotong University, Xi'an, Shanxi, 710049, Shanxi, China.

School of Information and Communications Engineering, Xi'an Jiaotong University, Xi'an, Shanxi, 710049, Shanxi, China; Artificial Intelligent Institute of iFLYTEK Research, Heifei, Anhui, 230088, China.

出版信息

Neural Netw. 2025 Jul 15;192:107875. doi: 10.1016/j.neunet.2025.107875.

DOI:10.1016/j.neunet.2025.107875
PMID:40684699
Abstract

Multi-agent reinforcement learning (MARL) plays a pivotal role in solving complex decision-making problems wherein multiple agents interact in a shared environment. However, mainstream MARL algorithms still suffer the following challenges: 1) the policies of agents tend to converge and stabilise during learning, which leads to insufficient explorations and sub-optimal strategies, particularly in environments with extremely large state, observation and action spaces and 2) the sampling inefficiency of MARL results in inadequate learning from the experience replay buffer, requiring a massive number of environmental interactions. To address these issues, we propose a novel MARL approach for various multi-agent decision-making tasks, namely efficient eXploration Joint with Training Unbiased for MARL (eXJTU-MARL), to fully enhance exploration efficiency during environmental interactions and the trajectory learning efficiency from the experience replay buffer. To achieve this, we introduce two core modules in eXJTU-MARL: adaptive policy resetting and state representation based balanced experience sampling. Specifically, for the first time, we introduce a state representation based sampling strategy that enhances data efficiency by improving the quality of experience replay samples in MARL. Accordingly, eXJTU-MARL effectively enhances sample efficiency, prevents agents from prematurely converging into sub-optimal policies and facilitates sufficient exploration of the state-action space. Extensive experiments in the StarCraft Multi-Agent Challenge environment demonstrate that our eXJTU-MARL consistently outperforms mainstream MARL baselines, highlighting the effectiveness of adaptive policy resetting and balanced experience sampling in enhancing the overall exploration capabilities and learning efficiency of MARL models in complex multi-agent environments. The code is available at GitHub: https://github.com/albert-jin/eXJTU-MARL.

摘要

多智能体强化学习(MARL)在解决复杂决策问题中起着关键作用,其中多个智能体在共享环境中相互作用。然而,主流的MARL算法仍然面临以下挑战:1)智能体的策略在学习过程中倾向于收敛和稳定,这导致探索不足和次优策略,特别是在具有极大状态、观察和动作空间的环境中;2)MARL的采样效率低下导致从经验回放缓冲区的学习不足,需要大量的环境交互。为了解决这些问题,我们提出了一种针对各种多智能体决策任务的新颖MARL方法,即高效探索与无偏训练联合的MARL(eXJTU-MARL),以充分提高环境交互期间的探索效率以及从经验回放缓冲区的轨迹学习效率。为了实现这一点,我们在eXJTU-MARL中引入了两个核心模块:自适应策略重置和基于状态表示的平衡经验采样。具体而言,我们首次引入了一种基于状态表示的采样策略,通过提高MARL中经验回放样本的质量来提高数据效率。因此,eXJTU-MARL有效地提高了样本效率,防止智能体过早收敛到次优策略,并促进对状态-动作空间的充分探索。在星际争霸多智能体挑战赛环境中的大量实验表明,我们的eXJTU-MARL始终优于主流的MARL基线,突出了自适应策略重置和平衡经验采样在增强复杂多智能体环境中MARL模型的整体探索能力和学习效率方面的有效性。代码可在GitHub上获取:https://github.com/albert-jin/eXJTU-MARL。

相似文献

1
Representation-driven sampling and adaptive policy resetting for improving multi-Agent reinforcement learning.用于改进多智能体强化学习的表征驱动采样与自适应策略重置
Neural Netw. 2025 Jul 15;192:107875. doi: 10.1016/j.neunet.2025.107875.
2
Attacking cooperative multi-agent reinforcement learning by adversarial minority influence.通过对抗性少数群体影响攻击合作多智能体强化学习。
Neural Netw. 2025 Nov;191:107747. doi: 10.1016/j.neunet.2025.107747. Epub 2025 Jun 21.
3
Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy.基于经验回放的演员-评论家算法用于前列腺癌调强放射治疗的自动治疗计划
Med Phys. 2025 Jul;52(7):e17915. doi: 10.1002/mp.17915. Epub 2025 May 31.
4
Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization.基于互信息正则化的鲁棒多智能体强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Oct;36(10):18118-18132. doi: 10.1109/TNNLS.2025.3577259.
5
Shapley value-driven multi-modal deep reinforcement learning for complex decision-making.用于复杂决策的沙普利值驱动多模态深度强化学习
Neural Netw. 2025 Nov;191:107650. doi: 10.1016/j.neunet.2025.107650. Epub 2025 Jun 21.
6
Short-Term Memory Impairment短期记忆障碍
7
MARLens: Understanding Multi-Agent Reinforcement Learning for Traffic Signal Control via Visual Analytics.MARLens:通过视觉分析理解用于交通信号控制的多智能体强化学习
IEEE Trans Vis Comput Graph. 2025 Jul;31(7):4018-4033. doi: 10.1109/TVCG.2024.3392587.
8
Counterfactual value decomposition for cooperative multi-agent reinforcement learning.合作多智能体强化学习的反事实值分解
Neural Netw. 2025 Oct;190:107692. doi: 10.1016/j.neunet.2025.107692. Epub 2025 Jun 16.
9
Multi-Task Multi-Agent Reinforcement Learning With Interaction and Task Representations.具有交互和任务表示的多任务多智能体强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):13431-13445. doi: 10.1109/TNNLS.2024.3475216.
10
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.