Salimibeni Mohammad, Mohammadi Arash, Malekzadeh Parvin, Plataniotis Konstantinos N
Concordia Institute for Information System Engineering, Concordia University, Montreal, QC H3G 1M8, Canada.
Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada.
Sensors (Basel). 2022 Feb 11;22(4):1393. doi: 10.3390/s22041393.
Development of distributed Multi-Agent Reinforcement Learning (MARL) algorithms has attracted an increasing surge of interest lately. Generally speaking, conventional Model-Based (MB) or Model-Free (MF) RL algorithms are not directly applicable to the MARL problems due to utilization of a fixed reward model for learning the underlying value function. While Deep Neural Network (DNN)-based solutions perform well, they are still prone to overfitting, high sensitivity to parameter selection, and sample inefficiency. In this paper, an adaptive Kalman Filter (KF)-based framework is introduced as an efficient alternative to address the aforementioned problems by capitalizing on unique characteristics of KF such as uncertainty modeling and online second order learning. More specifically, the paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR. The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments and exploit Kalman Temporal Difference (KTD) to address the parameter uncertainty. The proposed MAK-TD/SR frameworks are evaluated via several experiments, which are implemented through the OpenAI Gym MARL benchmarks. In these experiments, different number of agents in cooperative, competitive, and mixed (cooperative-competitive) scenarios are utilized. The experimental results illustrate superior performance of the proposed MAK-TD/SR frameworks compared to their state-of-the-art counterparts.
分布式多智能体强化学习(MARL)算法的发展近来引发了越来越多的关注。一般而言,传统的基于模型(MB)或无模型(MF)的强化学习算法由于使用固定奖励模型来学习潜在价值函数,因而不能直接应用于MARL问题。虽然基于深度神经网络(DNN)的解决方案表现良好,但它们仍然容易出现过拟合、对参数选择高度敏感以及样本低效等问题。在本文中,引入了一种基于自适应卡尔曼滤波器(KF)的框架,作为一种有效的替代方案,通过利用KF的独特特性(如不确定性建模和在线二阶学习)来解决上述问题。更具体地说,本文提出了多智能体自适应卡尔曼时间差分(MAK-TD)框架及其基于后继表示的变体,称为MAK-SR。所提出的MAK-TD/SR框架考虑了与高维多智能体环境相关的动作空间的连续性,并利用卡尔曼时间差分(KTD)来解决参数不确定性问题。通过几个实验对所提出的MAK-TD/SR框架进行了评估,这些实验是通过OpenAI Gym MARL基准实现的。在这些实验中,在合作、竞争和混合(合作-竞争)场景中使用了不同数量的智能体。实验结果表明,与现有最先进的同类框架相比,所提出的MAK-TD/SR框架具有卓越的性能。