IEEE Trans Cybern. 2015 Feb;45(2):165-76. doi: 10.1109/TCYB.2014.2322116. Epub 2014 May 29.
This paper presents a method of Q-learning to solve the discounted linear quadratic regulator (LQR) problem for continuous-time (CT) continuous-state systems. Most available methods in the existing literature for CT systems to solve the LQR problem generally need partial or complete knowledge of the system dynamics. Q-learning is effective for unknown dynamical systems, but has generally been well understood only for discrete-time systems. The contribution of this paper is to present a Q-learning methodology for CT systems which solves the LQR problem without having any knowledge of the system dynamics. A natural and rigorous justified parameterization of the Q-function is given in terms of the state, the control input, and its derivatives. This parameterization allows the implementation of an online Q-learning algorithm for CT systems. The simulation results supporting the theoretical development are also presented.
本文提出了一种 Q-learning 方法,用于解决连续时间(CT)连续状态系统的折扣线性二次调节器(LQR)问题。现有文献中大多数用于 CT 系统解决 LQR 问题的方法通常需要系统动态的部分或全部知识。Q-learning 对于未知动态系统是有效的,但通常仅对离散时间系统有很好的理解。本文的贡献在于提出了一种用于 CT 系统的 Q-learning 方法,该方法可以在不了解系统动态的情况下解决 LQR 问题。以状态、控制输入及其导数的形式,给出了 Q 函数的自然和严格合理的参数化。该参数化允许为 CT 系统实现在线 Q-learning 算法。还提出了支持理论发展的仿真结果。