Fan Litong, Yu Dengxiu, Hao Cheong Kang, Wang Zhen
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):12827-12839. doi: 10.1109/TNNLS.2024.3453385.
This article presents an optimal evolution strategy for continuous strategy games on complex networks via reinforcement learning (RL). In the past, evolutionary game theory usually assumed that agents use the same selection intensity when interacting, ignoring the differences in their learning abilities and learning willingness. Individuals are reluctant to change their strategies too much. Therefore, we design an adaptive strategy updating framework with various selection intensities for continuous strategy games on complex networks based on imitation dynamics, allowing agents to achieve the optimal state and a higher cooperation level with the minimal strategy changes. The optimal updating strategy is acquired using a coupled Hamilton-Jacobi-Bellman (HJB) equation by minimizing the performance function. This function aims to maximize individual payoffs while minimizing strategy changes. Furthermore, a value iteration (VI) RL algorithm is proposed to approximate the HJB solutions and learn the optimal strategy updating rules. The RL algorithm employs actor and critic neural networks to approximate strategy changes and performance functions, along with the gradient descent weight update approach. Meanwhile, the stability and convergence of the proposed methods have been proved by the designed Lyapunov function. Simulations validate the convergence and effectiveness of the proposed methods in different games and complex networks.
本文提出了一种基于强化学习(RL)的复杂网络上连续策略博弈的最优进化策略。过去,进化博弈理论通常假设参与者在交互时使用相同的选择强度,而忽略了它们学习能力和学习意愿的差异。个体不太愿意过多改变自己的策略。因此,我们基于模仿动力学为复杂网络上的连续策略博弈设计了一个具有各种选择强度的自适应策略更新框架,使参与者能够以最小的策略变化达到最优状态并实现更高的合作水平。通过最小化性能函数,利用耦合汉密尔顿 - 雅可比 - 贝尔曼(HJB)方程获得最优更新策略。该函数旨在最大化个体收益,同时最小化策略变化。此外,还提出了一种值迭代(VI)RL算法来逼近HJB解并学习最优策略更新规则。该RL算法采用演员和评论家神经网络来逼近策略变化和性能函数,并采用梯度下降权重更新方法。同时,通过设计李雅普诺夫函数证明了所提方法的稳定性和收敛性。仿真验证了所提方法在不同博弈和复杂网络中的收敛性和有效性。