Li Man, Qin Jiahu, Ma Qichao, Zheng Wei Xing, Kang Yu
IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1600-1611. doi: 10.1109/TNNLS.2020.2985738. Epub 2021 Apr 2.
Considering the fact that in the real world, a certain agent may have some sort of advantage to act before others, a novel hierarchical optimal synchronization problem for linear systems, composed of one major agent and multiple minor agents, is formulated and studied in this article from a Stackelberg-Nash game perspective. The major agent herein makes its decision prior to others, and then, all the minor agents determine their actions simultaneously. To seek the optimal controllers, the Hamilton-Jacobi-Bellman (HJB) equations in coupled forms are established, whose solutions are further proven to be stable and constitute the Stackelberg-Nash equilibrium. Due to the introduction of the asymmetric roles for agents, the established HJB equations are more strongly coupled and more difficult to solve than that given in most existing works. Therefore, we propose a new reinforcement learning (RL) algorithm, i.e., a two-level value iteration (VI) algorithm, which does not rely on complete system matrices. Furthermore, the proposed algorithm is shown to be convergent, and the converged values are exactly the optimal ones. To implement this VI algorithm, neural networks (NNs) are employed to approximate the value functions, and the gradient descent method is used to update the weights of NNs. Finally, an illustrative example is provided to verify the effectiveness of the proposed algorithm.
考虑到在现实世界中,某一智能体可能具有在其他智能体之前采取行动的某种优势,本文从斯塔克尔伯格 - 纳什博弈的角度出发,提出并研究了一种由一个主智能体和多个从智能体组成的线性系统的新型分层最优同步问题。这里的主智能体先于其他智能体做出决策,然后,所有从智能体同时确定其行动。为了寻求最优控制器,建立了耦合形式的哈密顿 - 雅可比 - 贝尔曼(HJB)方程,其解被进一步证明是稳定的,并构成斯塔克尔伯格 - 纳什均衡。由于智能体引入了不对称角色,所建立的HJB方程比大多数现有工作中的方程耦合更强且更难求解。因此,我们提出了一种新的强化学习(RL)算法,即两级值迭代(VI)算法,它不依赖于完整的系统矩阵。此外,所提出的算法被证明是收敛的,并且收敛值恰好是最优值。为了实现这种VI算法,采用神经网络(NNs)来逼近值函数,并使用梯度下降法来更新NNs的权重。最后,给出一个示例以验证所提出算法的有效性。