Suppr超能文献

具有未知漂移动态的非零和博弈的基于数据的强化学习

Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics.

作者信息

Zhang Qichao, Zhao Dongbin

出版信息

IEEE Trans Cybern. 2019 Aug;49(8):2874-2885. doi: 10.1109/TCYB.2018.2830820. Epub 2018 May 16.

Abstract

This paper is concerned about the nonlinear optimization problem of nonzero-sum (NZS) games with unknown drift dynamics. The data-based integral reinforcement learning (IRL) method is proposed to approximate the Nash equilibrium of NZS games iteratively. Furthermore, we prove that the data-based IRL method is equivalent to the model-based policy iteration algorithm, which guarantees the convergence of the proposed method. For the implementation purpose, a single-critic neural network structure for the NZS games is given. To enhance the application capability of the data-based IRL method, we design the updating laws of critic weights based on the offline and online iterative learning methods, respectively. Note that the experience replay technique is introduced in the online iterative learning, which can improve the convergence rate of critic weights during the learning process. The uniform ultimate boundedness of the critic weights are guaranteed using the Lyapunov method. Finally, the numerical results demonstrate the effectiveness of the data-based IRL algorithm for nonlinear NZS games with unknown drift dynamics.

摘要

本文关注具有未知漂移动力学的非零和(NZS)博弈的非线性优化问题。提出了基于数据的积分强化学习(IRL)方法来迭代逼近NZS博弈的纳什均衡。此外,我们证明了基于数据的IRL方法等同于基于模型的策略迭代算法,这保证了所提方法的收敛性。出于实现目的,给出了用于NZS博弈的单批评神经网络结构。为提高基于数据的IRL方法的应用能力,我们分别基于离线和在线迭代学习方法设计了批评家权重的更新律。注意,在线迭代学习中引入了经验回放技术,这可以在学习过程中提高批评家权重的收敛速度。使用李雅普诺夫方法保证了批评家权重的一致最终有界性。最后,数值结果证明了基于数据的IRL算法对于具有未知漂移动力学的非线性NZS博弈的有效性。

相似文献

4
Event-Triggered ADP for Nonzero-Sum Games of Unknown Nonlinear Systems.未知非线性系统非零和博弈的事件触发自适应动态规划
IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):1905-1913. doi: 10.1109/TNNLS.2021.3071545. Epub 2022 May 2.
10
Discrete-Time Non-Zero-Sum Games With Completely Unknown Dynamics.具有完全未知动态的离散时间非零和博弈
IEEE Trans Cybern. 2021 Jun;51(6):2929-2943. doi: 10.1109/TCYB.2019.2957406. Epub 2021 May 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验