Wang Ding, Zhao Mingming, Ha Mingming, Qiao Junfei
IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8707-8718. doi: 10.1109/TNNLS.2022.3152268. Epub 2023 Oct 27.
In this article, the general value iteration (GVI) algorithm for discrete-time zero-sum games is investigated. The theoretical analysis focuses on stability properties of the systems and also the admissibility properties of the iterative policy pair. A new criterion is established to determine the admissibility of the current policy pair. Besides, based on the admissibility criterion, the improved GVI algorithm toward zero-sum games is developed to guarantee that all iterative policy pairs are admissible if the current policy pair satisfies the criterion. On the basis of the attraction domain, we demonstrate that the state trajectory will stay in the region using the fixed or the evolving policy pair if the initial state belongs to the domain. It is emphasized that the evolving policy pair can stabilize the controlled system. These theoretical results are applied to linear and nonlinear systems via offline and online critic control design.
本文研究了离散时间零和博弈的一般值迭代(GVI)算法。理论分析集中于系统的稳定性以及迭代策略对的可容许性。建立了一个新的准则来确定当前策略对的可容许性。此外,基于该可容许性准则,开发了针对零和博弈的改进GVI算法,以确保如果当前策略对满足该准则,则所有迭代策略对都是可容许的。基于吸引域,我们证明如果初始状态属于该域,那么使用固定或演化策略对时状态轨迹将停留在该区域。需要强调的是,演化策略对可以使受控系统稳定。这些理论结果通过离线和在线评判控制设计应用于线性和非线性系统。