Suppr超能文献

用于非线性多智能体系统完全协作一致性问题的无模型强化学习

Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems.

作者信息

Wang Hong, Li Man

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1482-1491. doi: 10.1109/TNNLS.2020.3042508. Epub 2022 Apr 4.

Abstract

This article presents an off-policy model-free algorithm based on reinforcement learning (RL) to optimize the fully cooperative (FC) consensus problem of nonlinear continuous-time multiagent systems (MASs). First, the optimal FC consensus problem is transformed into solving the coupled Hamilton-Jacobian-Bellman (HJB) equation. Then, we propose a policy iteration (PI)-based algorithm, which is further proved to be effective to solve the coupled HJB equation. To implement this scheme in a model-free way, a model-free Bellman equation is derived to find the optimal value function and the optimal control policy for each agent. Then, based on the least-squares approach, the tuning law for actor and critic weights is derived by employing actor and critic neural networks into the model-free Bellman equation to approximate the target policies and the value function. Finally, we propose an off-policy model-free integral RL (IRL) algorithm, which can be used to optimize the FC consensus problem of the whole system in real time by using measured data. The effectiveness of this proposed algorithm is verified by the simulation results.

摘要

本文提出了一种基于强化学习(RL)的离策略无模型算法,用于优化非线性连续时间多智能体系统(MAS)的完全协作(FC)一致性问题。首先,将最优FC一致性问题转化为求解耦合的哈密顿 - 雅可比 - 贝尔曼(HJB)方程。然后,我们提出了一种基于策略迭代(PI)的算法,并进一步证明该算法对于求解耦合HJB方程是有效的。为了以无模型的方式实现该方案,推导了一个无模型贝尔曼方程,以找到每个智能体的最优值函数和最优控制策略。然后,基于最小二乘法,通过将行为者和评论家神经网络应用于无模型贝尔曼方程来近似目标策略和值函数,从而推导行为者和评论家权重的调整律。最后,我们提出了一种离策略无模型积分强化学习(IRL)算法,该算法可用于通过使用测量数据实时优化整个系统的FC一致性问题。仿真结果验证了该算法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验