Suppr超能文献

基于自适应卡尔曼时间差分和后继表示的多智能体强化学习

Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation.

作者信息

Salimibeni Mohammad, Mohammadi Arash, Malekzadeh Parvin, Plataniotis Konstantinos N

机构信息

Concordia Institute for Information System Engineering, Concordia University, Montreal, QC H3G 1M8, Canada.

Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada.

出版信息

Sensors (Basel). 2022 Feb 11;22(4):1393. doi: 10.3390/s22041393.

Abstract

Development of distributed Multi-Agent Reinforcement Learning (MARL) algorithms has attracted an increasing surge of interest lately. Generally speaking, conventional Model-Based (MB) or Model-Free (MF) RL algorithms are not directly applicable to the MARL problems due to utilization of a fixed reward model for learning the underlying value function. While Deep Neural Network (DNN)-based solutions perform well, they are still prone to overfitting, high sensitivity to parameter selection, and sample inefficiency. In this paper, an adaptive Kalman Filter (KF)-based framework is introduced as an efficient alternative to address the aforementioned problems by capitalizing on unique characteristics of KF such as uncertainty modeling and online second order learning. More specifically, the paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR. The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments and exploit Kalman Temporal Difference (KTD) to address the parameter uncertainty. The proposed MAK-TD/SR frameworks are evaluated via several experiments, which are implemented through the OpenAI Gym MARL benchmarks. In these experiments, different number of agents in cooperative, competitive, and mixed (cooperative-competitive) scenarios are utilized. The experimental results illustrate superior performance of the proposed MAK-TD/SR frameworks compared to their state-of-the-art counterparts.

摘要

分布式多智能体强化学习(MARL)算法的发展近来引发了越来越多的关注。一般而言,传统的基于模型(MB)或无模型(MF)的强化学习算法由于使用固定奖励模型来学习潜在价值函数,因而不能直接应用于MARL问题。虽然基于深度神经网络(DNN)的解决方案表现良好,但它们仍然容易出现过拟合、对参数选择高度敏感以及样本低效等问题。在本文中,引入了一种基于自适应卡尔曼滤波器(KF)的框架,作为一种有效的替代方案,通过利用KF的独特特性(如不确定性建模和在线二阶学习)来解决上述问题。更具体地说,本文提出了多智能体自适应卡尔曼时间差分(MAK-TD)框架及其基于后继表示的变体,称为MAK-SR。所提出的MAK-TD/SR框架考虑了与高维多智能体环境相关的动作空间的连续性,并利用卡尔曼时间差分(KTD)来解决参数不确定性问题。通过几个实验对所提出的MAK-TD/SR框架进行了评估,这些实验是通过OpenAI Gym MARL基准实现的。在这些实验中,在合作、竞争和混合(合作-竞争)场景中使用了不同数量的智能体。实验结果表明,与现有最先进的同类框架相比,所提出的MAK-TD/SR框架具有卓越的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a693/8962978/39418aa4e81a/sensors-22-01393-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验