具有同步或异步更新的近实时在线强化学习。

Near real-time online reinforcement learning with synchronous or asynchronous updates.

作者信息

Radac Mircea-Bogdan, Chirla Darius-Pavel

机构信息

Department of Automation and Applied Informatics, Politehnica University of Timisoara, Bvd. V. Parvan, 2, 300223, Timisoara, Romania.

, Timisoara, Romania.

出版信息

Sci Rep. 2025 May 17;15(1):17158. doi: 10.1038/s41598-025-00492-7.

DOI:10.1038/s41598-025-00492-7

PMID:40382371

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12085598/

Abstract

Reinforcement Learning (RL) is a well-known method for learning control of complex and unknown dynamical systems. In this paper, we propose a solution for addressing a major limitation of the existing RL schemes when it comes to interleaving the environment interaction step with the learning step. Leveraging the neural network approximation complexity with the real-time learning capability is one of several reasons for which RL has not been adopted more in practical control systems. Our online learning solution with near real-time capability is piloted by a model-reference tracking control problem where the underlying system state is encoded as a moving window of past output and input signals expanded with the reference model state and with the reference input state. The value function and the controller neural networks are trained online using the rules of backpropagation, based on the interaction experiences with the system. Two case studies, a simulation one and an experimental one involving a real hardware, show that the proposed methodology is valid. We compare learning performance operation times under two popular, high-level software packages with automatic differentiation capabilities, under both synchronous and asynchronous updates. The software challenges are discussed in detail based on code runtime numbers, concluding that for lower order systems with relative fast dynamics and adaptive characteristics, there is a strong incentive to further develop online synchronous RL that are closer to the real-time requirements. While the asynchronous online RL motivates scaling up the learning method to higher dimensional systems with faster dynamics, even in non hard real-time setups.

摘要

强化学习（RL）是一种用于学习控制复杂且未知动态系统的著名方法。在本文中，我们针对现有强化学习方案在将环境交互步骤与学习步骤交织时的一个主要局限性提出了一种解决方案。利用神经网络逼近复杂性和实时学习能力是强化学习在实际控制系统中未得到更广泛应用的几个原因之一。我们具有近实时能力的在线学习解决方案由一个模型参考跟踪控制问题驱动，其中基础系统状态被编码为过去输出和输入信号的移动窗口，并通过参考模型状态和参考输入状态进行扩展。基于与系统的交互经验，使用反向传播规则在线训练值函数和控制器神经网络。两个案例研究，一个是模拟研究，另一个是涉及真实硬件的实验研究，表明所提出的方法是有效的。我们在具有自动微分功能的两个流行高级软件包下，在同步和异步更新的情况下，比较了学习性能操作时间。基于代码运行时数据详细讨论了软件方面的挑战，得出结论：对于具有相对快速动态和自适应特性的低阶系统，有强烈的动机进一步开发更接近实时要求的在线同步强化学习。而异步在线强化学习则促使将学习方法扩展到具有更快动态的高维系统，即使在非硬实时设置中也是如此。