Song Shijie, Zhu Minglei, Dai Xiaolin, Gong Dawei
IEEE Trans Neural Netw Learn Syst. 2022 Jun 3;PP. doi: 10.1109/TNNLS.2022.3178746.
In this article, a novel model-free dynamic inversion-based Q-learning (DIQL) algorithm is proposed to solve the optimal tracking control (OTC) problem of unknown nonlinear input-affine discrete-time (DT) systems. Compared with the existing DIQL algorithm and the discount factor-based Q-learning (DFQL) algorithm, the proposed algorithm can eliminate the tracking error while ensuring that it is model-free and off-policy. First, a new deterministic Q-learning iterative scheme is presented, and based on this scheme, a model-based off-policy DIQL algorithm is designed. The advantage of this new scheme is that it can avoid the training of unusual data and improve data utilization, thereby saving computing resources. Simultaneously, the convergence and stability of the designed algorithm are analyzed, and the proof that adding probing noise into the behavior policy does not affect the convergence is presented. Then, by introducing neural networks (NNs), the model-free version of the designed algorithm is further proposed so that the OTC problem can be solved without any knowledge about the system dynamics. Finally, three simulation examples are given to demonstrate the effectiveness of the proposed algorithm.
在本文中,提出了一种基于新型无模型动态逆的Q学习(DIQL)算法,以解决未知非线性输入仿射离散时间(DT)系统的最优跟踪控制(OTC)问题。与现有的DIQL算法和基于折扣因子的Q学习(DFQL)算法相比,该算法在确保无模型和离策略的同时,可以消除跟踪误差。首先,提出了一种新的确定性Q学习迭代方案,并基于该方案设计了一种基于模型的离策略DIQL算法。这种新方案的优点是可以避免对异常数据的训练,提高数据利用率,从而节省计算资源。同时,分析了所设计算法的收敛性和稳定性,并给出了在行为策略中添加探测噪声不影响收敛性的证明。然后,通过引入神经网络(NN),进一步提出了所设计算法的无模型版本,以便在无需任何系统动力学知识的情况下解决OTC问题。最后,给出了三个仿真例子,以证明所提算法的有效性。