• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于非线性多智能体系统完全协作一致性问题的无模型强化学习

Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems.

作者信息

Wang Hong, Li Man

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1482-1491. doi: 10.1109/TNNLS.2020.3042508. Epub 2022 Apr 4.

DOI:10.1109/TNNLS.2020.3042508
PMID:33338022
Abstract

This article presents an off-policy model-free algorithm based on reinforcement learning (RL) to optimize the fully cooperative (FC) consensus problem of nonlinear continuous-time multiagent systems (MASs). First, the optimal FC consensus problem is transformed into solving the coupled Hamilton-Jacobian-Bellman (HJB) equation. Then, we propose a policy iteration (PI)-based algorithm, which is further proved to be effective to solve the coupled HJB equation. To implement this scheme in a model-free way, a model-free Bellman equation is derived to find the optimal value function and the optimal control policy for each agent. Then, based on the least-squares approach, the tuning law for actor and critic weights is derived by employing actor and critic neural networks into the model-free Bellman equation to approximate the target policies and the value function. Finally, we propose an off-policy model-free integral RL (IRL) algorithm, which can be used to optimize the FC consensus problem of the whole system in real time by using measured data. The effectiveness of this proposed algorithm is verified by the simulation results.

摘要

本文提出了一种基于强化学习(RL)的离策略无模型算法,用于优化非线性连续时间多智能体系统(MAS)的完全协作(FC)一致性问题。首先,将最优FC一致性问题转化为求解耦合的哈密顿 - 雅可比 - 贝尔曼(HJB)方程。然后,我们提出了一种基于策略迭代(PI)的算法,并进一步证明该算法对于求解耦合HJB方程是有效的。为了以无模型的方式实现该方案,推导了一个无模型贝尔曼方程,以找到每个智能体的最优值函数和最优控制策略。然后,基于最小二乘法,通过将行为者和评论家神经网络应用于无模型贝尔曼方程来近似目标策略和值函数,从而推导行为者和评论家权重的调整律。最后,我们提出了一种离策略无模型积分强化学习(IRL)算法,该算法可用于通过使用测量数据实时优化整个系统的FC一致性问题。仿真结果验证了该算法的有效性。

相似文献

1
Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems.用于非线性多智能体系统完全协作一致性问题的无模型强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1482-1491. doi: 10.1109/TNNLS.2020.3042508. Epub 2022 Apr 4.
2
Cooperative Differential Game-Based Distributed Optimal Synchronization Control of Heterogeneous Nonlinear Multiagent Systems.基于合作微分博弈的异构非线性多智能体系统分布式最优同步控制
IEEE Trans Cybern. 2023 Dec;53(12):7933-7942. doi: 10.1109/TCYB.2023.3240983. Epub 2023 Nov 29.
3
Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games.多智能体图博弈中的非策略强化学习同步。
IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2434-2445. doi: 10.1109/TNNLS.2016.2609500. Epub 2017 Apr 17.
4
Data-Based Optimal Synchronization of Heterogeneous Multiagent Systems in Graphical Games via Reinforcement Learning.基于强化学习的图形博弈中异构多智能体系统的数据驱动最优同步
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):15984-15992. doi: 10.1109/TNNLS.2023.3291542. Epub 2024 Oct 29.
5
Reinforcement learning solution for HJB equation arising in constrained optimal control problem.约束最优控制问题中出现的HJB方程的强化学习解决方案。
Neural Netw. 2015 Nov;71:150-8. doi: 10.1016/j.neunet.2015.08.007. Epub 2015 Aug 24.
6
Integral Reinforcement-Learning-Based Optimal Containment Control for Partially Unknown Nonlinear Multiagent Systems.基于积分强化学习的部分未知非线性多智能体系统最优遏制控制
Entropy (Basel). 2023 Jan 23;25(2):221. doi: 10.3390/e25020221.
7
Finite-Horizon Optimal Consensus Control for Unknown Multiagent State-Delay Systems.有限时域最优共识控制的未知多智能体时滞系统。
IEEE Trans Cybern. 2020 Feb;50(2):402-413. doi: 10.1109/TCYB.2018.2856510. Epub 2018 Sep 10.
8
Optimal Synchronization Control of Multiagent Systems With Input Saturation via Off-Policy Reinforcement Learning.基于离策略强化学习的具有输入饱和的多智能体系统最优同步控制
IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):85-96. doi: 10.1109/TNNLS.2018.2832025. Epub 2018 May 24.
9
A policy iteration approach to online optimal control of continuous-time constrained-input systems.一种连续时间约束输入系统在线最优控制的策略迭代方法。
ISA Trans. 2013 Sep;52(5):611-21. doi: 10.1016/j.isatra.2013.04.004. Epub 2013 May 24.
10
Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control.基于策略迭代的连续时间非线性最优控制有限时域近似动态规划
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5255-5267. doi: 10.1109/TNNLS.2022.3225090. Epub 2023 Sep 1.

引用本文的文献

1
Closed loop iterative learning control for consistency tracking in lower limb rehabilitation robotic system with initial state deviations.具有初始状态偏差的下肢康复机器人系统中用于一致性跟踪的闭环迭代学习控制
Sci Rep. 2025 Mar 20;15(1):9593. doi: 10.1038/s41598-025-92197-0.