• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于势场的面向生成子目标的多智能体强化学习。

Generative subgoal oriented multi-agent reinforcement learning through potential field.

机构信息

Academy of Military Science, Beijing, 100000, China.

出版信息

Neural Netw. 2024 Nov;179:106552. doi: 10.1016/j.neunet.2024.106552. Epub 2024 Jul 17.

DOI:10.1016/j.neunet.2024.106552
PMID:39089154
Abstract

Multi-agent reinforcement learning (MARL) effectively improves the learning speed of agents in sparse reward tasks with the guide of subgoals. However, existing works sever the consistency of the learning objectives of the subgoal generation and subgoal reached stages, thereby significantly inhibiting the effectiveness of subgoal learning. To address this problem, we propose a novel Potential field Subgoal-based Multi-Agent reinforcement learning (PSMA) method, which introduces the potential field (PF) to unify the two-stage learning objectives. Specifically, we design a state-to-PF representation model that describes agents' states as potential fields, allowing easy measurement of the interaction effect for both allied and enemy agents. With the PF representation, a subgoal selector is designed to automatically generate multiple subgoals for each agent, drawn from the experience replay buffer that contains both individual and total PF values. Based on the determined subgoals, we define an intrinsic reward function to guide the agent to reach their respective subgoals while maximizing the joint action-value. Experimental results show that our method outperforms the state-of-the-art MARL method on both StarCraft II micro-management (SMAC) and Google Research Football (GRF) tasks with sparse reward settings.

摘要

多智能体强化学习 (MARL) 通过子目标有效地提高了在稀疏奖励任务中智能体的学习速度。然而,现有工作切断了子目标生成和子目标达成阶段的学习目标的一致性,从而显著抑制了子目标学习的效果。为了解决这个问题,我们提出了一种新的基于势场的多智能体强化学习 (PSMA) 方法,该方法引入了势场 (PF) 来统一两个阶段的学习目标。具体来说,我们设计了一个状态到 PF 的表示模型,将智能体的状态表示为势场,允许轻松测量盟友和敌人智能体之间的相互作用效应。有了 PF 表示,我们设计了一个子目标选择器,用于从包含个体和总 PF 值的经验重放缓冲区中为每个智能体自动生成多个子目标。基于确定的子目标,我们定义了一个内在奖励函数,以指导智能体在最大化联合动作值的同时达到各自的子目标。实验结果表明,我们的方法在星际争霸 II 微观管理 (SMAC) 和 Google 研究足球 (GRF) 具有稀疏奖励设置的任务上都优于最先进的 MARL 方法。

相似文献

1
Generative subgoal oriented multi-agent reinforcement learning through potential field.基于势场的面向生成子目标的多智能体强化学习。
Neural Netw. 2024 Nov;179:106552. doi: 10.1016/j.neunet.2024.106552. Epub 2024 Jul 17.
2
LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.LJIR:在合作多智能体强化学习中学习联合行动内在奖励
Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.
3
Strangeness-driven exploration in multi-agent reinforcement learning.多智能体强化学习中的奇异驱动探索。
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
4
End-to-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery.具有集成子目标发现的端到端分层强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7778-7790. doi: 10.1109/TNNLS.2021.3087733. Epub 2022 Nov 30.
5
MuDE: Multi-agent decomposed reward-based exploration.MuDE:基于多代理分解奖励的探索。
Neural Netw. 2024 Nov;179:106565. doi: 10.1016/j.neunet.2024.106565. Epub 2024 Jul 22.
6
Value-Based Subgoal Discovery and Path Planning for Reaching Long-Horizon Goals.用于实现长期目标的基于价值的子目标发现与路径规划
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10288-10300. doi: 10.1109/TNNLS.2023.3240004. Epub 2024 Aug 5.
7
Discovering Intrinsic Subgoals for Vision- and-Language Navigation via Hierarchical Reinforcement Learning.通过分层强化学习发现视觉与语言导航的内在子目标
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6516-6528. doi: 10.1109/TNNLS.2024.3398300. Epub 2025 Apr 4.
8
HyperComm: Hypergraph-based communication in multi-agent reinforcement learning.超通讯:多智能体强化学习中的基于超图的通讯。
Neural Netw. 2024 Oct;178:106432. doi: 10.1016/j.neunet.2024.106432. Epub 2024 Jun 10.
9
Credit assignment with predictive contribution measurement in multi-agent reinforcement learning.多智能体强化学习中的信用分配与预测贡献度量。
Neural Netw. 2023 Jul;164:681-690. doi: 10.1016/j.neunet.2023.05.021. Epub 2023 May 20.
10
Hierarchical Attention Master-Slave for heterogeneous multi-agent reinforcement learning.分层注意力主从式异构多智能体强化学习。
Neural Netw. 2023 May;162:359-368. doi: 10.1016/j.neunet.2023.02.037. Epub 2023 Mar 4.