• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有可调收敛速度的离散时间非线性零和博弈的神经 Q 学习。

Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate.

机构信息

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China; Beijing Institute of Artificial Intelligence, Beijing University of Technology, Beijing 100124, China; Beijing Laboratory of Smart Environmental Protection, Beijing University of Technology, Beijing 100124, China.

出版信息

Neural Netw. 2024 Jul;175:106274. doi: 10.1016/j.neunet.2024.106274. Epub 2024 Mar 27.

DOI:10.1016/j.neunet.2024.106274
PMID:38583264
Abstract

In this paper, an adjustable Q-learning scheme is developed to solve the discrete-time nonlinear zero-sum game problem, which can accelerate the convergence rate of the iterative Q-function sequence. First, the monotonicity and convergence of the iterative Q-function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model-free tracking control problem can be overcome for zero-sum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q-learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.

摘要

本文提出了一种可调整的 Q 学习方案,以解决离散时间非线性零和博弈问题,从而加快迭代 Q 函数序列的收敛速度。首先,在一些条件下分析了迭代 Q 函数序列的单调性和收敛性。此外,通过使用神经网络,可以解决零和博弈的无模型跟踪控制问题。其次,设计了两种实用算法来保证具有加速学习的收敛性。在一个算法中,在 Q 学习的迭代过程中添加了可调整的加速阶段,可以自适应地终止以保证收敛。在另一个算法中,开发了一种新的加速函数,可以调整松弛因子以确保收敛。最后,通过具有实际物理背景的仿真示例,展示了神经网络中开发算法的出色性能。

相似文献

1
Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate.具有可调收敛速度的离散时间非线性零和博弈的神经 Q 学习。
Neural Netw. 2024 Jul;175:106274. doi: 10.1016/j.neunet.2024.106274. Epub 2024 Mar 27.
2
Evolving and Incremental Value Iteration Schemes for Nonlinear Discrete-Time Zero-Sum Games.非线性离散时间零和博弈的演进和增量价值迭代方案。
IEEE Trans Cybern. 2023 Jul;53(7):4487-4499. doi: 10.1109/TCYB.2022.3198078. Epub 2023 Jun 15.
3
Dichotomy value iteration with parallel learning design towards discrete-time zero-sum games.面向离散时间零和博弈的具有并行学习设计的二分法值迭代
Neural Netw. 2023 Oct;167:751-762. doi: 10.1016/j.neunet.2023.09.009. Epub 2023 Sep 7.
4
Neural critic learning with accelerated value iteration for nonlinear model predictive control.神经批评学习与加速价值迭代的非线性模型预测控制。
Neural Netw. 2024 Aug;176:106364. doi: 10.1016/j.neunet.2024.106364. Epub 2024 May 6.
5
Optimal H tracking control of nonlinear systems with zero-equilibrium-free via novel adaptive critic designs.通过新颖的自适应评价设计实现具有零平衡点的非线性系统的最优 H 跟踪控制。
Neural Netw. 2023 Jul;164:105-114. doi: 10.1016/j.neunet.2023.04.021. Epub 2023 Apr 20.
6
Advanced optimal tracking integrating a neural critic technique for asymmetric constrained zero-sum games.高级最优跟踪,整合神经批评技术,用于非对称约束零和博弈。
Neural Netw. 2024 Sep;177:106388. doi: 10.1016/j.neunet.2024.106388. Epub 2024 May 15.
7
Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints.基于神经网络的一类具有控制约束的离散时间仿射非线性系统的近最优控制
IEEE Trans Neural Netw. 2009 Sep;20(9):1490-503. doi: 10.1109/TNN.2009.2027233. Epub 2009 Aug 4.
8
Novel optimal trajectory tracking for nonlinear affine systems with an advanced critic learning structure.具有先进评价学习结构的非线性仿射系统的新型最优轨迹跟踪。
Neural Netw. 2022 Oct;154:131-140. doi: 10.1016/j.neunet.2022.07.019. Epub 2022 Jul 16.
9
Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis.离散时间确定性 Q 学习:一种新的收敛性分析。
IEEE Trans Cybern. 2017 May;47(5):1224-1237. doi: 10.1109/TCYB.2016.2542923. Epub 2016 Apr 11.
10
A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm.一种基于贪婪HDP迭代算法的一类离散时间非线性系统的新型无限时间最优跟踪控制方案。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):937-42. doi: 10.1109/TSMCB.2008.920269.