Suppr超能文献

基于多智能体强化学习的完全协作任务算法。

FMRQ-A Multiagent Reinforcement Learning Algorithm for Fully Cooperative Tasks.

出版信息

IEEE Trans Cybern. 2017 Jun;47(6):1367-1379. doi: 10.1109/TCYB.2016.2544866. Epub 2016 Apr 14.

Abstract

In this paper, we propose a multiagent reinforcement learning algorithm dealing with fully cooperative tasks. The algorithm is called frequency of the maximum reward Q-learning (FMRQ). FMRQ aims to achieve one of the optimal Nash equilibria so as to optimize the performance index in multiagent systems. The frequency of obtaining the highest global immediate reward instead of immediate reward is used as the reinforcement signal. With FMRQ each agent does not need the observation of the other agents' actions and only shares its state and reward at each step. We validate FMRQ through case studies of repeated games: four cases of two-player two-action and one case of three-player two-action. It is demonstrated that FMRQ can converge to one of the optimal Nash equilibria in these cases. Moreover, comparison experiments on tasks with multiple states and finite steps are conducted. One is box-pushing and the other one is distributed sensor network problem. Experimental results show that the proposed algorithm outperforms others with higher performance.

摘要

在本文中,我们提出了一种用于完全合作任务的多智能体强化学习算法。该算法称为最大奖励频率 Q 学习(FMRQ)。FMRQ 旨在达到其中一个最优纳什均衡,从而优化多智能体系统中的性能指标。使用获得最高全局即时奖励的频率而不是即时奖励作为强化信号。使用 FMRQ,每个智能体不需要观察其他智能体的动作,只需在每个步骤中共享其状态和奖励。我们通过重复游戏的案例研究验证了 FMRQ:四个两人两动作案例和一个三人两动作案例。结果表明,FMRQ 可以在这些情况下收敛到其中一个最优纳什均衡。此外,还进行了具有多个状态和有限步骤的任务的对比实验。一个是推箱子问题,另一个是分布式传感器网络问题。实验结果表明,该算法在性能方面优于其他算法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验