Suppr超能文献

LJIR:在合作多智能体强化学习中学习联合行动内在奖励

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.

作者信息

Chen Zihan, Luo Biao, Hu Tianmeng, Xu Xiaodong

机构信息

School of Automation, Central South University, Changsha 410083, China.

出版信息

Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.

Abstract

Effective exploration is the key to achieving high returns for reinforcement learning. Agents must explore jointly in multi-agent systems to find the optimal joint policy. Due to the exploration problem and the shared reward, the policy-based multi-agent reinforcement learning algorithms face policy overfitting, which may lead to the joint policy falling into a local optimum. This paper introduces a novel general framework called Learning Joint-Action Intrinsic Reward (LJIR) for improving multi-agent reinforcement learners' joint exploration ability and performance. LJIR observes agents' state and joint actions to learn to construct an intrinsic reward online that can guide effective joint exploration. With the novel combination of Transformer and random network distillation, LJIR selects the novel states to give more intrinsic rewards, which help agents find the best joint actions. LJIR can dynamically adjust the weight of exploration and exploitation during training and keep the policy invariance finally. To ensure LJIR seamlessly adopts existing MARL algorithms, we also provide a flexible combination method for intrinsic and external rewards. Empirical results on the SMAC benchmark show that the proposed method achieves state-of-the-art performance in challenging tasks.

摘要

有效的探索是强化学习获得高回报的关键。在多智能体系统中,智能体必须联合探索以找到最优联合策略。由于探索问题和共享奖励,基于策略的多智能体强化学习算法面临策略过拟合问题,这可能导致联合策略陷入局部最优。本文介绍了一种名为学习联合动作内在奖励(LJIR)的新颖通用框架,用于提高多智能体强化学习者的联合探索能力和性能。LJIR观察智能体的状态和联合动作,以在线学习构建可指导有效联合探索的内在奖励。通过Transformer和随机网络蒸馏的新颖组合,LJIR选择新颖状态以给予更多内在奖励,这有助于智能体找到最佳联合动作。LJIR可以在训练期间动态调整探索和利用的权重,并最终保持策略不变性。为确保LJIR无缝采用现有的多智能体强化学习算法,我们还提供了一种内在奖励和外部奖励的灵活组合方法。在SMAC基准上的实证结果表明,该方法在具有挑战性的任务中取得了领先的性能。

相似文献

1
LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning.
Neural Netw. 2023 Oct;167:450-459. doi: 10.1016/j.neunet.2023.08.016. Epub 2023 Aug 22.
2
Strangeness-driven exploration in multi-agent reinforcement learning.
Neural Netw. 2024 Apr;172:106149. doi: 10.1016/j.neunet.2024.106149. Epub 2024 Jan 26.
3
MuDE: Multi-agent decomposed reward-based exploration.
Neural Netw. 2024 Nov;179:106565. doi: 10.1016/j.neunet.2024.106565. Epub 2024 Jul 22.
4
Generative subgoal oriented multi-agent reinforcement learning through potential field.
Neural Netw. 2024 Nov;179:106552. doi: 10.1016/j.neunet.2024.106552. Epub 2024 Jul 17.
5
A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers.
Int J Neural Syst. 2023 Dec;33(12):2350065. doi: 10.1142/S012906572350065X. Epub 2023 Oct 20.
6
Multi-agent Continuous Control with Generative Flow Networks.
Neural Netw. 2024 Jun;174:106243. doi: 10.1016/j.neunet.2024.106243. Epub 2024 Mar 20.
7
Optimistic sequential multi-agent reinforcement learning with motivational communication.
Neural Netw. 2024 Nov;179:106547. doi: 10.1016/j.neunet.2024.106547. Epub 2024 Jul 22.
8
An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control.
Neural Netw. 2024 Feb;170:610-621. doi: 10.1016/j.neunet.2023.11.046. Epub 2023 Nov 23.
9
Credit assignment with predictive contribution measurement in multi-agent reinforcement learning.
Neural Netw. 2023 Jul;164:681-690. doi: 10.1016/j.neunet.2023.05.021. Epub 2023 May 20.
10
Curiosity-driven recommendation strategy for adaptive learning via deep reinforcement learning.
Br J Math Stat Psychol. 2020 Nov;73(3):522-540. doi: 10.1111/bmsp.12199. Epub 2020 Feb 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验