• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于迭代确定性Q学习算法的非线性输入仿射离散时间系统的无模型最优跟踪控制

Model-Free Optimal Tracking Control of Nonlinear Input-Affine Discrete-Time Systems via an Iterative Deterministic Q-Learning Algorithm.

作者信息

Song Shijie, Zhu Minglei, Dai Xiaolin, Gong Dawei

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jun 3;PP. doi: 10.1109/TNNLS.2022.3178746.

DOI:10.1109/TNNLS.2022.3178746
PMID:35657846
Abstract

In this article, a novel model-free dynamic inversion-based Q-learning (DIQL) algorithm is proposed to solve the optimal tracking control (OTC) problem of unknown nonlinear input-affine discrete-time (DT) systems. Compared with the existing DIQL algorithm and the discount factor-based Q-learning (DFQL) algorithm, the proposed algorithm can eliminate the tracking error while ensuring that it is model-free and off-policy. First, a new deterministic Q-learning iterative scheme is presented, and based on this scheme, a model-based off-policy DIQL algorithm is designed. The advantage of this new scheme is that it can avoid the training of unusual data and improve data utilization, thereby saving computing resources. Simultaneously, the convergence and stability of the designed algorithm are analyzed, and the proof that adding probing noise into the behavior policy does not affect the convergence is presented. Then, by introducing neural networks (NNs), the model-free version of the designed algorithm is further proposed so that the OTC problem can be solved without any knowledge about the system dynamics. Finally, three simulation examples are given to demonstrate the effectiveness of the proposed algorithm.

摘要

在本文中,提出了一种基于新型无模型动态逆的Q学习(DIQL)算法,以解决未知非线性输入仿射离散时间(DT)系统的最优跟踪控制(OTC)问题。与现有的DIQL算法和基于折扣因子的Q学习(DFQL)算法相比,该算法在确保无模型和离策略的同时,可以消除跟踪误差。首先,提出了一种新的确定性Q学习迭代方案,并基于该方案设计了一种基于模型的离策略DIQL算法。这种新方案的优点是可以避免对异常数据的训练,提高数据利用率,从而节省计算资源。同时,分析了所设计算法的收敛性和稳定性,并给出了在行为策略中添加探测噪声不影响收敛性的证明。然后,通过引入神经网络(NN),进一步提出了所设计算法的无模型版本,以便在无需任何系统动力学知识的情况下解决OTC问题。最后,给出了三个仿真例子,以证明所提算法的有效性。

相似文献

1
Model-Free Optimal Tracking Control of Nonlinear Input-Affine Discrete-Time Systems via an Iterative Deterministic Q-Learning Algorithm.基于迭代确定性Q学习算法的非线性输入仿射离散时间系统的无模型最优跟踪控制
IEEE Trans Neural Netw Learn Syst. 2022 Jun 3;PP. doi: 10.1109/TNNLS.2022.3178746.
2
Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems.离策略交错Q学习:仿射非线性离散时间系统的最优控制
IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1308-1320. doi: 10.1109/TNNLS.2018.2861945. Epub 2018 Sep 26.
3
Data-Driven Optimal Tracking Control for Discrete-Time Nonlinear Systems With Unknown Dynamics Using Deterministic ADP.基于确定性自适应动态规划的未知动力学离散时间非线性系统的数据驱动最优跟踪控制
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):1184-1198. doi: 10.1109/TNNLS.2023.3323142. Epub 2025 Jan 7.
4
Model-Free Q-Learning for the Tracking Problem of Linear Discrete-Time Systems.
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3191-3201. doi: 10.1109/TNNLS.2022.3195357. Epub 2024 Feb 29.
5
Novel optimal trajectory tracking for nonlinear affine systems with an advanced critic learning structure.具有先进评价学习结构的非线性仿射系统的新型最优轨迹跟踪。
Neural Netw. 2022 Oct;154:131-140. doi: 10.1016/j.neunet.2022.07.019. Epub 2022 Jul 16.
6
Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems.基于 actor-critic 的部分未知非线性离散时间系统最优跟踪。
IEEE Trans Neural Netw Learn Syst. 2015 Jan;26(1):140-51. doi: 10.1109/TNNLS.2014.2358227. Epub 2014 Oct 8.
7
Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate.具有可调收敛速度的离散时间非线性零和博弈的神经 Q 学习。
Neural Netw. 2024 Jul;175:106274. doi: 10.1016/j.neunet.2024.106274. Epub 2024 Mar 27.
8
Asynchronous iterative Q-learning based tracking control for nonlinear discrete-time multi-agent systems.基于异步迭代 Q 学习的非线性离散时间多智能体系统跟踪控制。
Neural Netw. 2024 Dec;180:106667. doi: 10.1016/j.neunet.2024.106667. Epub 2024 Aug 26.
9
Data-Driven H Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning.基于离策略Q学习的线性离散时间系统数据驱动H最优输出反馈控制
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3553-3567. doi: 10.1109/TNNLS.2021.3112457. Epub 2023 Jul 6.
10
Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems.用于未知仿射非线性离散时间系统 H∞状态反馈控制的在线自适应策略学习算法。
IEEE Trans Cybern. 2014 Dec;44(12):2706-18. doi: 10.1109/TCYB.2014.2313915. Epub 2014 Jul 28.

引用本文的文献

1
Optimal control under safety constraints and disturbances: a multi-step, off-policy adaptive dynamic programming approach.安全约束和干扰下的最优控制:一种多步、离策略自适应动态规划方法。
Nonlinear Dyn. 2025;113(17):22973-22999. doi: 10.1007/s11071-025-11329-3. Epub 2025 Jun 15.