• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有同步或异步更新的近实时在线强化学习。

Near real-time online reinforcement learning with synchronous or asynchronous updates.

作者信息

Radac Mircea-Bogdan, Chirla Darius-Pavel

机构信息

Department of Automation and Applied Informatics, Politehnica University of Timisoara, Bvd. V. Parvan, 2, 300223, Timisoara, Romania.

, Timisoara, Romania.

出版信息

Sci Rep. 2025 May 17;15(1):17158. doi: 10.1038/s41598-025-00492-7.

DOI:10.1038/s41598-025-00492-7
PMID:40382371
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12085598/
Abstract

Reinforcement Learning (RL) is a well-known method for learning control of complex and unknown dynamical systems. In this paper, we propose a solution for addressing a major limitation of the existing RL schemes when it comes to interleaving the environment interaction step with the learning step. Leveraging the neural network approximation complexity with the real-time learning capability is one of several reasons for which RL has not been adopted more in practical control systems. Our online learning solution with near real-time capability is piloted by a model-reference tracking control problem where the underlying system state is encoded as a moving window of past output and input signals expanded with the reference model state and with the reference input state. The value function and the controller neural networks are trained online using the rules of backpropagation, based on the interaction experiences with the system. Two case studies, a simulation one and an experimental one involving a real hardware, show that the proposed methodology is valid. We compare learning performance operation times under two popular, high-level software packages with automatic differentiation capabilities, under both synchronous and asynchronous updates. The software challenges are discussed in detail based on code runtime numbers, concluding that for lower order systems with relative fast dynamics and adaptive characteristics, there is a strong incentive to further develop online synchronous RL that are closer to the real-time requirements. While the asynchronous online RL motivates scaling up the learning method to higher dimensional systems with faster dynamics, even in non hard real-time setups.

摘要

强化学习(RL)是一种用于学习控制复杂且未知动态系统的著名方法。在本文中,我们针对现有强化学习方案在将环境交互步骤与学习步骤交织时的一个主要局限性提出了一种解决方案。利用神经网络逼近复杂性和实时学习能力是强化学习在实际控制系统中未得到更广泛应用的几个原因之一。我们具有近实时能力的在线学习解决方案由一个模型参考跟踪控制问题驱动,其中基础系统状态被编码为过去输出和输入信号的移动窗口,并通过参考模型状态和参考输入状态进行扩展。基于与系统的交互经验,使用反向传播规则在线训练值函数和控制器神经网络。两个案例研究,一个是模拟研究,另一个是涉及真实硬件的实验研究,表明所提出的方法是有效的。我们在具有自动微分功能的两个流行高级软件包下,在同步和异步更新的情况下,比较了学习性能操作时间。基于代码运行时数据详细讨论了软件方面的挑战,得出结论:对于具有相对快速动态和自适应特性的低阶系统,有强烈的动机进一步开发更接近实时要求的在线同步强化学习。而异步在线强化学习则促使将学习方法扩展到具有更快动态的高维系统,即使在非硬实时设置中也是如此。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/31444e5b965f/41598_2025_492_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/04c5623b43cc/41598_2025_492_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/acd33be02753/41598_2025_492_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/2189558251bf/41598_2025_492_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/a491623784d8/41598_2025_492_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/7a6ed0984317/41598_2025_492_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/8c38abb60816/41598_2025_492_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/886d9df7e88e/41598_2025_492_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/31444e5b965f/41598_2025_492_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/04c5623b43cc/41598_2025_492_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/acd33be02753/41598_2025_492_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/2189558251bf/41598_2025_492_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/a491623784d8/41598_2025_492_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/7a6ed0984317/41598_2025_492_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/8c38abb60816/41598_2025_492_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/886d9df7e88e/41598_2025_492_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087b/12085598/31444e5b965f/41598_2025_492_Fig7_HTML.jpg

相似文献

1
Near real-time online reinforcement learning with synchronous or asynchronous updates.具有同步或异步更新的近实时在线强化学习。
Sci Rep. 2025 May 17;15(1):17158. doi: 10.1038/s41598-025-00492-7.
2
Multiplayer Differential Games of Markov Jump Systems via Reinforcement Learning.基于强化学习的马尔可夫跳跃系统多人微分博弈
IEEE Trans Cybern. 2025 Apr;55(4):1860-1872. doi: 10.1109/TCYB.2025.3538787. Epub 2025 Mar 21.
3
Incremental model-based reinforcement learning with model constraint.基于模型约束的增量式强化学习
Neural Netw. 2025 May;185:107245. doi: 10.1016/j.neunet.2025.107245. Epub 2025 Feb 8.
4
Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning.基于数据的多智能体系统最优共识控制与策略梯度强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3872-3883. doi: 10.1109/TNNLS.2021.3054685. Epub 2022 Aug 3.
5
Reactive Reinforcement Learning in Asynchronous Environments.异步环境中的反应式强化学习
Front Robot AI. 2018 Jun 26;5:79. doi: 10.3389/frobt.2018.00079. eCollection 2018.
6
Optimal Tracking Control of Heterogeneous MASs Using Event-Driven Adaptive Observer and Reinforcement Learning.基于事件驱动自适应观测器和强化学习的异构多智能体系统最优跟踪控制
IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):5577-5587. doi: 10.1109/TNNLS.2022.3208237. Epub 2024 Apr 4.
7
Reinforcement learning based adaptive optimal control for constrained nonlinear system via a novel state-dependent transformation.基于强化学习的约束非线性系统自适应最优控制:一种新型状态依赖变换方法
ISA Trans. 2023 Feb;133:29-41. doi: 10.1016/j.isatra.2022.07.006. Epub 2022 Jul 12.
8
Human-Guided Reinforcement Learning With Sim-to-Real Transfer for Autonomous Navigation.用于自主导航的基于人引导强化学习的模拟到现实迁移
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14745-14759. doi: 10.1109/TPAMI.2023.3314762. Epub 2023 Nov 3.
9
Adaptive optimal trajectory tracking control of AUVs based on reinforcement learning.基于强化学习的 AUV 自适应最优轨迹跟踪控制。
ISA Trans. 2023 Jun;137:122-132. doi: 10.1016/j.isatra.2022.12.003. Epub 2022 Dec 8.
10
Robust walking control of a lower limb rehabilitation exoskeleton coupled with a musculoskeletal model via deep reinforcement learning.通过深度强化学习,实现下肢康复外骨骼与肌肉骨骼模型的稳健行走控制。
J Neuroeng Rehabil. 2023 Mar 19;20(1):34. doi: 10.1186/s12984-023-01147-2.

本文引用的文献

1
Superconducting quantum computing optimization based on multi-objective deep reinforcement learning.基于多目标深度强化学习的超导量子计算优化
Sci Rep. 2025 Jan 30;15(1):3828. doi: 10.1038/s41598-024-73456-y.
2
Reinforcement learning based route optimization model to enhance energy efficiency in internet of vehicles.基于强化学习的路径优化模型以提高车联网中的能源效率
Sci Rep. 2025 Jan 24;15(1):3113. doi: 10.1038/s41598-025-86608-5.
3
Adaptive average arterial pressure control by multi-agent on-policy reinforcement learning.
基于多智能体在线策略强化学习的自适应平均动脉压控制
Sci Rep. 2025 Jan 3;15(1):679. doi: 10.1038/s41598-024-84791-5.
4
Bio particle swarm optimization and reinforcement learning algorithm for path planning of automated guided vehicles in dynamic industrial environments.用于动态工业环境中自动导引车路径规划的生物粒子群优化与强化学习算法
Sci Rep. 2025 Jan 2;15(1):463. doi: 10.1038/s41598-024-84821-2.
5
Standing balance of single-legged hopping robot model using reinforcement learning approach in the presence of external disturbances.基于强化学习方法的单腿跳跃机器人模型在存在外部干扰情况下的站立平衡。
Sci Rep. 2024 Dec 30;14(1):32036. doi: 10.1038/s41598-024-83749-x.
6
Exploiting full-duplex opportunities in WLANs via a reinforcement learning-based medium access control protocol.通过基于强化学习的介质访问控制协议在无线局域网中利用全双工机会。
Sci Rep. 2024 Dec 28;14(1):31406. doi: 10.1038/s41598-024-83025-y.
7
Exploring spiking neural networks for deep reinforcement learning in robotic tasks.探索用于机器人任务中深度强化学习的脉冲神经网络。
Sci Rep. 2024 Dec 28;14(1):30648. doi: 10.1038/s41598-024-77779-8.
8
A reinforcement learning approach for reducing traffic congestion using deep Q learning.一种使用深度Q学习减少交通拥堵的强化学习方法。
Sci Rep. 2024 Dec 12;14(1):30452. doi: 10.1038/s41598-024-75638-0.
9
Reinforcement Learning for Blast Furnace Ironmaking Operation With Safety and Partial Observation Considerations.考虑安全与部分可观测性的高炉炼铁操作强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3077-3090. doi: 10.1109/TNNLS.2023.3340741. Epub 2024 Feb 29.
10
Realizing a deep reinforcement learning agent for real-time quantum feedback.实现一个用于实时量子反馈的深度强化学习智能体。
Nat Commun. 2023 Nov 6;14(1):7138. doi: 10.1038/s41467-023-42901-3.