• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ACERAC:精细时间离散化中的高效强化学习

ACERAC: Efficient Reinforcement Learning in Fine Time Discretization.

作者信息

Lyskawa Jakub, Wawrzynski Pawel

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2719-2731. doi: 10.1109/TNNLS.2022.3190973. Epub 2024 Feb 5.

DOI:10.1109/TNNLS.2022.3190973
PMID:35857727
Abstract

One of the main goals of reinforcement learning (RL) is to provide a way for physical machines to learn optimal behavior instead of being programmed. However, effective control of the machines usually requires fine time discretization. The most common RL methods apply independent random elements to each action, which is not suitable in that setting. It is not feasible because it causes the controlled system to jerk and does not ensure sufficient exploration since a single action is not long enough to create a significant experience that could be translated into policy improvement. In our view, these are the main obstacles that prevent the application of RL in contemporary control systems. To address these pitfalls, in this article, we introduce an RL framework and adequate analytical tools for actions that may be stochastically dependent in subsequent time instances. We also introduce an RL algorithm that approximately optimizes a policy that produces such actions. It applies experience replay (ER) to adjust the likelihood of sequences of previous actions to optimize expected n -step returns that the policy yields. The efficiency of this algorithm is verified against four other RL methods [continuous deep advantage updating (CDAU), proximal policy optimization (PPO), soft actor-critic (SAC), and actor-critic with ER (ACER)] in four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) in diverse time discretization. The algorithm introduced here outperforms the competitors in most cases considered.

摘要

强化学习(RL)的主要目标之一是为物理机器提供一种学习最优行为的方式,而非通过编程实现。然而,对机器的有效控制通常需要精细的时间离散化。最常见的RL方法对每个动作应用独立的随机元素,这在那种情况下并不适用。这是不可行的,因为它会导致受控系统产生抖动,并且由于单个动作持续时间不足够长,无法创造出可转化为策略改进的显著经验,从而不能确保充分的探索。在我们看来,这些是阻碍RL在当代控制系统中应用的主要障碍。为了解决这些问题,在本文中,我们针对在后续时间实例中可能具有随机依赖性的动作,引入了一个RL框架和适当的分析工具。我们还引入了一种RL算法,该算法近似优化产生此类动作的策略。它应用经验回放(ER)来调整先前动作序列的可能性,以优化该策略产生的预期n步回报。在四个模拟学习控制问题(蚂蚁、半猎豹、跳虫和人形机器人2D)中,针对不同的时间离散化,将该算法的效率与其他四种RL方法[连续深度优势更新(CDAU)、近端策略优化(PPO)、软演员评论家(SAC)和带ER的演员评论家(ACER)]进行了验证。在大多数考虑的情况下,这里介绍的算法优于竞争对手。

相似文献

1
ACERAC: Efficient Reinforcement Learning in Fine Time Discretization.ACERAC:精细时间离散化中的高效强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2719-2731. doi: 10.1109/TNNLS.2022.3190973. Epub 2024 Feb 5.
2
Stochastic Integrated Actor-Critic for Deep Reinforcement Learning.用于深度强化学习的随机集成演员-评论家算法
IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6654-6666. doi: 10.1109/TNNLS.2022.3212273. Epub 2024 May 2.
3
Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences.改进的软演员-评论家算法:将优先离策略样本与在线策略经验相结合。
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3121-3129. doi: 10.1109/TNNLS.2022.3174051. Epub 2024 Feb 29.
4
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors.分布软演员-评论家:用于解决价值估计误差的离策略强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Nov;33(11):6584-6598. doi: 10.1109/TNNLS.2021.3082568. Epub 2022 Oct 27.
5
Relative Entropy Regularized Sample-Efficient Reinforcement Learning With Continuous Actions.具有连续动作的相对熵正则化样本高效强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):475-485. doi: 10.1109/TNNLS.2023.3329513. Epub 2025 Jan 7.
6
A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems.一种使用自注意力机制的优先经验回放演员-评论家算法,用于离散问题的策略优化。
PeerJ Comput Sci. 2024 Jun 28;10:e2161. doi: 10.7717/peerj-cs.2161. eCollection 2024.
7
Characterizing Motor Control of Mastication With Soft Actor-Critic.基于软演员评论家算法的咀嚼运动控制特性分析
Front Hum Neurosci. 2020 May 26;14:188. doi: 10.3389/fnhum.2020.00188. eCollection 2020.
8
Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor-Critic with Hindsight Experience Replay.使用深度强化学习的多臂机械臂路径规划:带有后见之明经验回放的软动作-评论家。
Sensors (Basel). 2020 Oct 19;20(20):5911. doi: 10.3390/s20205911.
9
Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation.策略梯度估计中具有正则化和特征选择的演员-评论家学习控制
IEEE Trans Neural Netw Learn Syst. 2021 Mar;32(3):1217-1227. doi: 10.1109/TNNLS.2020.2981377. Epub 2021 Mar 1.
10
Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units.基于监督演员-评论员的强化学习算法在重症监护病房智能机械通气和镇静药物剂量调节中的应用
BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):124. doi: 10.1186/s12911-020-1120-5.