• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于增强实时强化学习的延迟鲁棒方法。

A delay-robust method for enhanced real-time reinforcement learning.

作者信息

Xia Bo, Sun Haoyuan, Yuan Bo, Li Zhiheng, Liang Bin, Wang Xueqian

机构信息

Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.

Research Institute of Tsinghua University in Shenzhen, Shenzhen, 518057, China.

出版信息

Neural Netw. 2025 Jan;181:106769. doi: 10.1016/j.neunet.2024.106769. Epub 2024 Oct 1.

DOI:10.1016/j.neunet.2024.106769
PMID:39395235
Abstract

In reinforcement learning, the Markov Decision Process (MDP) framework typically operates under a blocking paradigm, assuming a static environment during the agent's decision-making and stationary agent behavior while the environment executes its actions. This static model often proves inadequate for real-time tasks, as it lacks the flexibility to handle concurrent changes in both the agent's decision-making process and the environment's dynamic responses. Contemporary solutions, such as linear interpolation or state space augmentation, attempt to address the asynchronous nature of delayed states and actions in real-time environments. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. However, these methods frequently require precise delay measurements and may fail to fully capture the complexities of delay dynamics. To address these challenges, we introduce a minimal information set that encapsulates concurrent information during agent-environment interactions, serving as the foundation of our real-time decision-making framework. The traditional blocking-mode MDP is then reformulated as a Minimal Information State Markov Decision Process (MISMDP), aligning more closely with the demands of real-time environments. Within this MISMDP framework, we propose the "Minimal information set for Real-time tasks using Actor-Critic" (MRAC), a general approach for addressing delay issues in real-time tasks, supported by a rigorous theoretical analysis of Q-function convergence. Extensive experiments across both discrete and continuous action space environments demonstrate that MRAC outperforms state-of-the-art algorithms, delivering superior performance and generalization in managing delays within real-time tasks.

摘要

在强化学习中,马尔可夫决策过程(MDP)框架通常在阻塞范式下运行,假设在智能体决策过程中环境是静态的,并且在环境执行其动作时智能体行为是平稳的。这种静态模型对于实时任务往往证明是不够的,因为它缺乏处理智能体决策过程和环境动态响应中并发变化的灵活性。当代的解决方案,如线性插值或状态空间扩充,试图解决实时环境中延迟状态和动作的异步性质。然而,这些方法经常需要精确的延迟测量,并且可能无法完全捕捉延迟动态的复杂性。为了应对这些挑战,我们引入了一个最小信息集,它在智能体与环境交互期间封装并发信息,作为我们实时决策框架的基础。然后将传统的阻塞模式MDP重新表述为最小信息状态马尔可夫决策过程(MISMDP),使其更紧密地符合实时环境的要求。在这个MISMDP框架内,我们提出了“使用演员-评论家的实时任务最小信息集”(MRAC),这是一种解决实时任务中延迟问题的通用方法,并得到了对Q函数收敛的严格理论分析的支持。在离散和连续动作空间环境中的大量实验表明,MRAC优于现有算法,在管理实时任务中的延迟方面具有卓越的性能和泛化能力。

相似文献

1
A delay-robust method for enhanced real-time reinforcement learning.一种用于增强实时强化学习的延迟鲁棒方法。
Neural Netw. 2025 Jan;181:106769. doi: 10.1016/j.neunet.2024.106769. Epub 2024 Oct 1.
2
Optimizing Attention and Cognitive Control Costs Using Temporally Layered Architectures.利用时间分层架构优化注意力和认知控制成本。
Neural Comput. 2024 Nov 19;36(12):2734-2763. doi: 10.1162/neco_a_01718.
3
Meta attention for Off-Policy Actor-Critic.用于离策略演员-评论家的元注意力机制
Neural Netw. 2023 Jun;163:86-96. doi: 10.1016/j.neunet.2023.03.024. Epub 2023 Mar 28.
4
Target Tracking Control of a Biomimetic Underwater Vehicle Through Deep Reinforcement Learning.通过深度强化学习的仿生水下航行器目标跟踪控制。
IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3741-3752. doi: 10.1109/TNNLS.2021.3054402. Epub 2022 Aug 3.
5
Reactive Reinforcement Learning in Asynchronous Environments.异步环境中的反应式强化学习
Front Robot AI. 2018 Jun 26;5:79. doi: 10.3389/frobt.2018.00079. eCollection 2018.
6
Episodic Memory-Double Actor-Critic Twin Delayed Deep Deterministic Policy Gradient.情景记忆 - 双智能体 - 评论家双延迟深度确定性策略梯度
Neural Netw. 2025 Jul;187:107286. doi: 10.1016/j.neunet.2025.107286. Epub 2025 Feb 27.
7
Parameterized MDPs and Reinforcement Learning Problems-A Maximum Entropy Principle-Based Framework.参数化马尔可夫决策过程和强化学习问题——基于最大熵原理的框架。
IEEE Trans Cybern. 2022 Sep;52(9):9339-9351. doi: 10.1109/TCYB.2021.3102510. Epub 2022 Aug 18.
8
HMM for discovering decision-making dynamics using reinforcement learning experiments.用于通过强化学习实验发现决策动态的隐马尔可夫模型。
Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae033.
9
Deep reinforcement learning navigation via decision transformer in autonomous driving.自动驾驶中基于决策变换器的深度强化学习导航
Front Neurorobot. 2024 Mar 19;18:1338189. doi: 10.3389/fnbot.2024.1338189. eCollection 2024.
10
Reinforcement learning for automatic quadrilateral mesh generation: A soft actor-critic approach.用于自动四边形网格生成的强化学习:一种软演员-评论家方法。
Neural Netw. 2023 Jan;157:288-304. doi: 10.1016/j.neunet.2022.10.022. Epub 2022 Oct 29.