连续时间 Q 学习算法在无限时域折扣成本线性二次调节器问题中的应用。

Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems.

出版信息

IEEE Trans Cybern. 2015 Feb;45(2):165-76. doi: 10.1109/TCYB.2014.2322116. Epub 2014 May 29.

DOI:10.1109/TCYB.2014.2322116

Abstract

This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the system dynamics. Q-learning is effective for unknown dynamical systems, but has generally been well understood only for discrete-time systems. The contribution of this paper is to present a Q-learning methodology for CT systems which solves the LQR problem without having any knowledge of the system dynamics. A natural and rigorous justified parameterization of the Q-function is given in terms of the state, the control input, and its derivatives. This parameterization allows the implementation of an online Q-learning algorithm for CT systems. The simulation results supporting the theoretical development are also presented.

摘要

本文提出了一种 Q-learning 方法，用于解决连续时间（CT）连续状态系统的折扣线性二次调节器（LQR）问题。现有文献中大多数用于 CT 系统解决 LQR 问题的方法通常需要系统动态的部分或全部知识。Q-learning 对于未知动态系统是有效的，但通常仅对离散时间系统有很好的理解。本文的贡献在于提出了一种用于 CT 系统的 Q-learning 方法，该方法可以在不了解系统动态的情况下解决 LQR 问题。以状态、控制输入及其导数的形式，给出了 Q 函数的自然和严格合理的参数化。该参数化允许为 CT 系统实现在线 Q-learning 算法。还提出了支持理论发展的仿真结果。

相似文献

Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems.

IEEE Trans Cybern. 2015 Feb;45(2):165-76. doi: 10.1109/TCYB.2014.2322116. Epub 2014 May 29.

Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem.

IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1523-1536. doi: 10.1109/TNNLS.2018.2870075. Epub 2018 Oct 8.

Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):943-9. doi: 10.1109/TSMCB.2008.926614.

Koopman Invariant Subspaces and Finite Linear Representations of Nonlinear Dynamical Systems for Control.

PLoS One. 2016 Feb 26;11(2):e0150171. doi: 10.1371/journal.pone.0150171. eCollection 2016.

Solutions to the Inverse LQR Problem with Application to Biological Systems Analysis.

IEEE Trans Control Syst Technol. 2015 Mar;23(2):770-777. doi: 10.1109/TCST.2014.2343935. Epub 2014 Aug 19.

Optimal Tracking Control of Unknown Discrete-Time Linear Systems Using Input-Output Measured Data.

IEEE Trans Cybern. 2015 Dec;45(12):2770-9. doi: 10.1109/TCYB.2014.2384016. Epub 2015 Jan 6.

Optimal linear-consensus algorithms: an LQR perspective.

IEEE Trans Syst Man Cybern B Cybern. 2010 Jun;40(3):819-30. doi: 10.1109/TSMCB.2009.2030495. Epub 2009 Oct 30.

Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback.

IEEE Trans Cybern. 2019 Jan 3. doi: 10.1109/TCYB.2018.2886735.

Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems.

IEEE Trans Neural Netw Learn Syst. 2015 Jan;26(1):140-51. doi: 10.1109/TNNLS.2014.2358227. Epub 2014 Oct 8.

Output Feedback Q-Learning for Linear-Quadratic Discrete-Time Finite-Horizon Control Problems.

IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):3274-3281. doi: 10.1109/TNNLS.2020.3010304. Epub 2021 Jul 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

连续时间 Q 学习算法在无限时域折扣成本线性二次调节器问题中的应用。

Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems.

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献