IEEE Trans Cybern. 2021 May;51(5):2419-2432. doi: 10.1109/TCYB.2019.2926248. Epub 2021 Apr 15.
In this paper, we study the constrained optimization problem of a class of uncertain nonlinear interconnected systems. First, we prove that the solution of the constrained optimization problem can be obtained through solving an array of optimal control problems of constrained auxiliary subsystems. Then, under the framework of approximate dynamic programming, we present a simultaneous policy iteration (SPI) algorithm to solve the Hamilton-Jacobi-Bellman equations corresponding to the constrained auxiliary subsystems. By building an equivalence relationship, we demonstrate the convergence of the SPI algorithm. Meanwhile, we implement the SPI algorithm via an actor-critic structure, where actor networks are used to approximate optimal control policies and critic networks are applied to estimate optimal value functions. By using the least squares method and the Monte Carlo integration technique together, we are able to determine the weight vectors of actor and critic networks. Finally, we validate the developed control method through the simulation of a nonlinear interconnected plant.
在本文中,我们研究了一类不确定非线性互联系统的约束优化问题。首先,我们证明通过求解约束辅助子系统的一系列最优控制问题可以得到约束优化问题的解。然后,在近似动态规划的框架下,我们提出了一种同时策略迭代(SPI)算法来求解相应的约束辅助子系统的 Hamilton-Jacobi-Bellman 方程。通过建立等价关系,我们证明了 SPI 算法的收敛性。同时,我们通过采用 actor-critic 结构来实现 SPI 算法,其中 actor 网络用于近似最优控制策略,而 critic 网络用于估计最优值函数。通过使用最小二乘法和蒙特卡罗积分技术,我们可以确定 actor 和 critic 网络的权向量。最后,我们通过对一个非线性互联植物的仿真验证了所提出的控制方法。