• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过嵌入辅助系统实现无模型强化学习以实现非线性系统的最优控制

Model-Free Reinforcement Learning by Embedding an Auxiliary System for Optimal Control of Nonlinear Systems.

作者信息

Xu Zhenhui, Shen Tielong, Cheng Daizhan

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1520-1534. doi: 10.1109/TNNLS.2020.3042589. Epub 2022 Apr 4.

DOI:10.1109/TNNLS.2020.3042589
PMID:33347416
Abstract

In this article, a novel integral reinforcement learning (IRL) algorithm is proposed to solve the optimal control problem for continuous-time nonlinear systems with unknown dynamics. The main challenging issue in learning is how to reject the oscillation caused by the externally added probing noise. This article challenges the issue by embedding an auxiliary trajectory that is designed as an exciting signal to learn the optimal solution. First, the auxiliary trajectory is used to decompose the state trajectory of the controlled system. Then, by using the decoupled trajectories, a model-free policy iteration (PI) algorithm is developed, where the policy evaluation step and the policy improvement step are alternated until convergence to the optimal solution. It is noted that an appropriate external input is introduced at the policy improvement step to eliminate the requirement of the input-to-state dynamics. Finally, the algorithm is implemented on the actor-critic structure. The output weights of the critic neural network (NN) and the actor NN are updated sequentially by the least-squares methods. The convergence of the algorithm and the stability of the closed-loop system are guaranteed. Two examples are given to show the effectiveness of the proposed algorithm.

摘要

本文提出了一种新颖的积分强化学习(IRL)算法,用于解决动力学未知的连续时间非线性系统的最优控制问题。学习过程中的主要挑战是如何抑制外部添加的探测噪声引起的振荡。本文通过嵌入一个辅助轨迹来解决这个问题,该辅助轨迹被设计为一个激励信号以学习最优解。首先,辅助轨迹用于分解受控系统的状态轨迹。然后,利用解耦后的轨迹,开发了一种无模型策略迭代(PI)算法,其中策略评估步骤和策略改进步骤交替进行,直到收敛到最优解。需要注意的是,在策略改进步骤中引入了适当的外部输入,以消除对输入到状态动力学的要求。最后,该算法在演员-评论家结构上实现。评论家神经网络(NN)和演员NN的输出权重通过最小二乘法依次更新。保证了算法的收敛性和闭环系统的稳定性。给出了两个例子来说明所提算法的有效性。

相似文献

1
Model-Free Reinforcement Learning by Embedding an Auxiliary System for Optimal Control of Nonlinear Systems.通过嵌入辅助系统实现无模型强化学习以实现非线性系统的最优控制
IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1520-1534. doi: 10.1109/TNNLS.2020.3042589. Epub 2022 Apr 4.
2
Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.基于策略迭代和神经网络的未知约束输入系统自适应最优控制。
IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.
3
Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints.一类具有不等式约束的连续时间非仿射非线性系统的自适应近乎最优控制
ISA Trans. 2017 Jan;66:122-133. doi: 10.1016/j.isatra.2016.10.019. Epub 2016 Nov 9.
4
Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games.基于非策略积分的强化学习方法求解非线性连续时间多人非零和博弈
IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):704-713. doi: 10.1109/TNNLS.2016.2582849. Epub 2016 Jul 20.
5
A policy iteration approach to online optimal control of continuous-time constrained-input systems.一种连续时间约束输入系统在线最优控制的策略迭代方法。
ISA Trans. 2013 Sep;52(5):611-21. doi: 10.1016/j.isatra.2013.04.004. Epub 2013 May 24.
6
Model-Free Reinforcement Learning for Fully Cooperative Consensus Problem of Nonlinear Multiagent Systems.用于非线性多智能体系统完全协作一致性问题的无模型强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Apr;33(4):1482-1491. doi: 10.1109/TNNLS.2020.3042508. Epub 2022 Apr 4.
7
Adaptive Actor-Critic Design-Based Integral Sliding-Mode Control for Partially Unknown Nonlinear Systems With Input Disturbances.基于自适应动作-评论家设计的积分滑模控制在存在输入干扰的部分未知非线性系统中的应用。
IEEE Trans Neural Netw Learn Syst. 2016 Jan;27(1):165-77. doi: 10.1109/TNNLS.2015.2472974. Epub 2015 Sep 9.
8
Integral Reinforcement-Learning-Based Optimal Containment Control for Partially Unknown Nonlinear Multiagent Systems.基于积分强化学习的部分未知非线性多智能体系统最优遏制控制
Entropy (Basel). 2023 Jan 23;25(2):221. doi: 10.3390/e25020221.
9
Combined control algorithm based on synchronous reinforcement learning for a self-balancing bicycle robot.基于同步强化学习的自平衡自行车机器人组合控制算法
ISA Trans. 2024 Feb;145:479-492. doi: 10.1016/j.isatra.2023.11.032. Epub 2023 Nov 23.
10
Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics.具有未知漂移动态的非零和博弈的基于数据的强化学习
IEEE Trans Cybern. 2019 Aug;49(8):2874-2885. doi: 10.1109/TCYB.2018.2830820. Epub 2018 May 16.