Ha Mingming, Wang Ding, Liu Derong
IEEE Trans Cybern. 2022 Dec;52(12):13262-13274. doi: 10.1109/TCYB.2021.3107801. Epub 2022 Nov 18.
This article is concerned with the stability of the closed-loop system using various control policies generated by value iteration. Some stability properties involving admissibility criteria, the attraction domain, and so forth, are investigated. An offline integrated value iteration (VI) scheme with a stability guarantee is developed by combining the advantages of VI and policy iteration, which is convenient to obtain admissible control policies. Also, based on the concept of attraction domain, an online adaptive dynamic programming algorithm using immature control policies is developed. Remarkably, it is ensured that the state trajectory under the online algorithm converges to the origin. Particularly, for linear systems, the online ADP algorithm with a general scheme possesses more enhanced stability property. The theoretical results reveal that the stability of the linear system can be guaranteed even if the control policy sequence includes finite unstable elements. The numerical results verify the effectiveness of the present algorithms.
本文关注使用由值迭代生成的各种控制策略的闭环系统的稳定性。研究了一些涉及可容许性准则、吸引域等的稳定性性质。通过结合值迭代和策略迭代的优点,开发了一种具有稳定性保证的离线集成值迭代(VI)方案,该方案便于获得可容许的控制策略。此外,基于吸引域的概念,开发了一种使用不成熟控制策略的在线自适应动态规划算法。值得注意的是,确保了在线算法下的状态轨迹收敛到原点。特别地,对于线性系统,具有一般方案的在线ADP算法具有更强的稳定性性质。理论结果表明,即使控制策略序列包含有限个不稳定元素,也能保证线性系统的稳定性。数值结果验证了本文算法的有效性。