• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

离散时间线性二次调节器问题的输出反馈Q学习控制

Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem.

作者信息

Rizvi Syed Ali Asad, Lin Zongli

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1523-1536. doi: 10.1109/TNNLS.2018.2870075. Epub 2018 Oct 8.

DOI:10.1109/TNNLS.2018.2870075
PMID:30296242
Abstract

Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.

摘要

近似动态规划(ADP)和强化学习(RL)已成为设计最优和自适应控制系统的重要工具。现有的大多数RL和ADP方法都采用全状态反馈,而这一要求在实际应用中往往难以满足。因此,输出反馈方法更具优势,因为它们放宽了这一要求。在本文中,我们提出了一种基于输出反馈的新型Q学习方法,用于解决离散时间系统的线性二次调节(LQR)控制问题。所提出的方案本质上是完全在线的,并且在无需系统动态信息的情况下即可工作。更具体地说,根据输入输出数据开发了一种新的LQR Q函数表示形式。基于这种新的Q函数表示形式,设计了输出反馈LQR控制器。我们提出了两种基于策略迭代和值迭代方法的输出反馈迭代Q学习算法。该方案的优点是不会产生任何激励噪声偏差,因此无需使用折扣成本函数,进而确保了闭环稳定性。结果表明,所提出的算法收敛于LQR Riccati方程的解。进行了全面的仿真研究,验证了所提出的方案。

相似文献

1
Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem.离散时间线性二次调节器问题的输出反馈Q学习控制
IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1523-1536. doi: 10.1109/TNNLS.2018.2870075. Epub 2018 Oct 8.
2
Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback.基于强化学习的连续时间系统动态输出反馈线性二次调节
IEEE Trans Cybern. 2019 Jan 3. doi: 10.1109/TCYB.2018.2886735.
3
Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data.基于输入-输出实测数据的未知离散时间线性系统的最优跟踪控制。
IEEE Trans Cybern. 2015 Dec;45(12):2770-9. doi: 10.1109/TCYB.2014.2384016. Epub 2015 Jan 6.
4
Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data.部分可观测动态过程的强化学习:使用测量输出数据的自适应动态规划
IEEE Trans Syst Man Cybern B Cybern. 2011 Feb;41(1):14-25. doi: 10.1109/TSMCB.2010.2043839. Epub 2010 Mar 29.
5
Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems.连续时间 Q 学习算法在无限时域折扣成本线性二次调节器问题中的应用。
IEEE Trans Cybern. 2015 Feb;45(2):165-76. doi: 10.1109/TCYB.2014.2322116. Epub 2014 May 29.
6
Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning.基于无策略强化学习的未知连续时间线性系统最优输出反馈控制。
IEEE Trans Cybern. 2016 Nov;46(11):2401-2410. doi: 10.1109/TCYB.2015.2477810. Epub 2016 Sep 22.
7
Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof.使用近似动态规划的离散时间非线性HJB解:收敛性证明
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):943-9. doi: 10.1109/TSMCB.2008.926614.
8
Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代
IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.
9
Model-Free Q-Learning for the Tracking Problem of Linear Discrete-Time Systems.
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3191-3201. doi: 10.1109/TNNLS.2022.3195357. Epub 2024 Feb 29.
10
Optimal Output Regulation of Linear Discrete-Time Systems With Unknown Dynamics Using Reinforcement Learning.基于强化学习的未知动态线性离散时间系统的最优输出调节
IEEE Trans Cybern. 2020 Jul;50(7):3147-3156. doi: 10.1109/TCYB.2018.2890046. Epub 2019 Jan 25.