Suppr超能文献

基于势场的面向生成子目标的多智能体强化学习。

Generative subgoal oriented multi-agent reinforcement learning through potential field.

机构信息

Academy of Military Science, Beijing, 100000, China.

出版信息

Neural Netw. 2024 Nov;179:106552. doi: 10.1016/j.neunet.2024.106552. Epub 2024 Jul 17.

Abstract

Multi-agent reinforcement learning (MARL) effectively improves the learning speed of agents in sparse reward tasks with the guide of subgoals. However, existing works sever the consistency of the learning objectives of the subgoal generation and subgoal reached stages, thereby significantly inhibiting the effectiveness of subgoal learning. To address this problem, we propose a novel Potential field Subgoal-based Multi-Agent reinforcement learning (PSMA) method, which introduces the potential field (PF) to unify the two-stage learning objectives. Specifically, we design a state-to-PF representation model that describes agents' states as potential fields, allowing easy measurement of the interaction effect for both allied and enemy agents. With the PF representation, a subgoal selector is designed to automatically generate multiple subgoals for each agent, drawn from the experience replay buffer that contains both individual and total PF values. Based on the determined subgoals, we define an intrinsic reward function to guide the agent to reach their respective subgoals while maximizing the joint action-value. Experimental results show that our method outperforms the state-of-the-art MARL method on both StarCraft II micro-management (SMAC) and Google Research Football (GRF) tasks with sparse reward settings.

摘要

多智能体强化学习 (MARL) 通过子目标有效地提高了在稀疏奖励任务中智能体的学习速度。然而,现有工作切断了子目标生成和子目标达成阶段的学习目标的一致性,从而显著抑制了子目标学习的效果。为了解决这个问题,我们提出了一种新的基于势场的多智能体强化学习 (PSMA) 方法,该方法引入了势场 (PF) 来统一两个阶段的学习目标。具体来说,我们设计了一个状态到 PF 的表示模型,将智能体的状态表示为势场,允许轻松测量盟友和敌人智能体之间的相互作用效应。有了 PF 表示,我们设计了一个子目标选择器,用于从包含个体和总 PF 值的经验重放缓冲区中为每个智能体自动生成多个子目标。基于确定的子目标,我们定义了一个内在奖励函数,以指导智能体在最大化联合动作值的同时达到各自的子目标。实验结果表明,我们的方法在星际争霸 II 微观管理 (SMAC) 和 Google 研究足球 (GRF) 具有稀疏奖励设置的任务上都优于最先进的 MARL 方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验