• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于加速值迭代的安全Q学习用于数据驱动的最优跟踪控制

Accelerated Value Iteration-Based Safe Q-Learning for Data-Driven Optimal Tracking Control.

作者信息

Zhao Mingming, Wang Ding, Song Shijie, Qiao Junfei

出版信息

IEEE Trans Cybern. 2025 Jul;55(7):3511-3524. doi: 10.1109/TCYB.2025.3562172.

DOI:10.1109/TCYB.2025.3562172
PMID:40315067
Abstract

In this article, an accelerated value iteration-based safe Q-learning (SQL) algorithm is developed to design the tracking controller for unknown nonlinear systems. First, an augmented Q-function, consisting of a quadratic utility function and an adjustable positive-definite control barrier function (CBF), is devised to ensure both the optimality and safety of the tracking controller. The quadratic utility function, associated with optimality, guarantees that the tracking controller can eliminate the ultimate tracking error, regardless of the reference trajectory. The adjustable positive-definite CBF, pertaining to safety, ensures that the tracking error converges faster toward zero while remaining within the safe set at all times. Second, an accelerated iterative learning mechanism, comprising policy evaluation (PE) and policy improvement (PI), is employed to discover the safe optimal tracking control policy. Integrating the difference between two iterative Q-functions into the current PE process can expedite the convergence rate of the SQL algorithm. A policy optimization technique based on Nesterov Momentum method is utilized to accelerate the PI process of the SQL algorithm. When faced with a large amount of offline data, the two-stage accelerated learning effectively reduces computational pressure. Furthermore, convergence of the Q-function sequence and safety of the optimal tracking policy are theoretically analyzed. Finally, by using neural networks and the action-critic structure, two simulation examples are performed to verify the availability of accelerated SQL methods.

摘要

在本文中,开发了一种基于加速值迭代的安全Q学习(SQL)算法,用于设计未知非线性系统的跟踪控制器。首先,设计了一种由二次效用函数和可调正定控制障碍函数(CBF)组成的增强Q函数,以确保跟踪控制器的最优性和安全性。与最优性相关的二次效用函数保证了跟踪控制器能够消除最终跟踪误差,而不管参考轨迹如何。与安全性相关的可调正定CBF确保跟踪误差在始终保持在安全集内的同时更快地收敛到零。其次,采用一种由策略评估(PE)和策略改进(PI)组成的加速迭代学习机制来发现安全最优跟踪控制策略。将两个迭代Q函数之间的差异整合到当前的PE过程中,可以加快SQL算法的收敛速度。利用基于Nesterov动量法的策略优化技术来加速SQL算法的PI过程。当面对大量离线数据时,两阶段加速学习有效地降低了计算压力。此外,还对Q函数序列的收敛性和最优跟踪策略的安全性进行了理论分析。最后,通过使用神经网络和动作-评论家结构,进行了两个仿真例子来验证加速SQL方法的有效性。

相似文献

1
Accelerated Value Iteration-Based Safe Q-Learning for Data-Driven Optimal Tracking Control.基于加速值迭代的安全Q学习用于数据驱动的最优跟踪控制
IEEE Trans Cybern. 2025 Jul;55(7):3511-3524. doi: 10.1109/TCYB.2025.3562172.
2
Neural-network-based accelerated safe Q-learning for optimal control of discrete-time nonlinear systems with state constraints.
Neural Netw. 2025 Jun;186:107249. doi: 10.1016/j.neunet.2025.107249. Epub 2025 Feb 10.
3
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
4
Integrating computer vision algorithms and RFID system for identification and tracking of group-housed animals: an example with pigs.整合计算机视觉算法和射频识别系统用于群居动物的识别与跟踪:以猪为例。
J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae174.
5
How to Implement Digital Clinical Consultations in UK Maternity Care: the ARM@DA Realist Review.如何在英国产科护理中实施数字临床会诊:ARM@DA实证主义综述
Health Soc Care Deliv Res. 2025 May 21:1-77. doi: 10.3310/WQFV7425.
6
Adefovir dipivoxil and pegylated interferon alfa-2a for the treatment of chronic hepatitis B: a systematic review and economic evaluation.阿德福韦酯与聚乙二醇化干扰素α-2a治疗慢性乙型肝炎:系统评价与经济学评估
Health Technol Assess. 2006 Aug;10(28):iii-iv, xi-xiv, 1-183. doi: 10.3310/hta10280.
7
Electronic cigarettes for smoking cessation.电子烟戒烟。
Cochrane Database Syst Rev. 2024 Jan 8;1(1):CD010216. doi: 10.1002/14651858.CD010216.pub8.
8
Electronic cigarettes for smoking cessation.电子烟戒烟。
Cochrane Database Syst Rev. 2022 Nov 17;11(11):CD010216. doi: 10.1002/14651858.CD010216.pub7.
9
Immunogenicity and seroefficacy of pneumococcal conjugate vaccines: a systematic review and network meta-analysis.肺炎球菌结合疫苗的免疫原性和血清效力:系统评价和网络荟萃分析。
Health Technol Assess. 2024 Jul;28(34):1-109. doi: 10.3310/YWHA3079.
10
Electronic cigarettes for smoking cessation.电子烟戒烟。
Cochrane Database Syst Rev. 2021 Sep 14;9(9):CD010216. doi: 10.1002/14651858.CD010216.pub6.