• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

奖励自适应强化学习:用于两足运动的动态策略梯度优化。

Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7686-7695. doi: 10.1109/TPAMI.2022.3223407. Epub 2023 May 5.

DOI:10.1109/TPAMI.2022.3223407
PMID:36409817
Abstract

Controlling a non-statically bipedal robot is challenging due to the complex dynamics and multi-criterion optimization involved. Recent works have demonstrated the effectiveness of deep reinforcement learning (DRL) for simulation and physical robots. In these methods, the rewards from different criteria are normally summed to learn a scalar function. However, a scalar is less informative and may be insufficient to derive effective information for each reward channel from the complex hybrid rewards. In this work, we propose a novel reward-adaptive reinforcement learning method for biped locomotion, allowing the control policy to be simultaneously optimized by multiple criteria using a dynamic mechanism. The proposed method applies a multi-head critic to learn a separate value function for each reward component, leading to hybrid policy gradients. We further propose dynamic weight, allowing each component to optimize the policy with different priorities. This hybrid and dynamic policy gradient (HDPG) design makes the agent learn more efficiently. We show that the proposed method outperforms summed-up-reward approaches and is able to transfer to physical robots. The MuJoCo results further demonstrate the effectiveness and generalization of HDPG.

摘要

控制非静态双足机器人具有挑战性,因为涉及到复杂的动力学和多准则优化。最近的研究表明,深度强化学习(DRL)在模拟和物理机器人方面非常有效。在这些方法中,通常将来自不同标准的奖励相加以学习标量函数。然而,标量的信息量较少,并且可能不足以从复杂的混合奖励中为每个奖励通道得出有效信息。在这项工作中,我们提出了一种用于双足运动的新的奖励自适应强化学习方法,允许控制策略使用动态机制同时通过多个标准进行优化。所提出的方法应用多头评论家来为每个奖励分量学习单独的价值函数,从而导致混合策略梯度。我们进一步提出了动态权重,允许每个分量以不同的优先级优化策略。这种混合和动态策略梯度(HDPG)设计使代理更有效地学习。我们表明,所提出的方法优于总和奖励方法,并且能够转移到物理机器人。MuJoCo 的结果进一步证明了 HDPG 的有效性和泛化能力。

相似文献

1
Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion.奖励自适应强化学习:用于两足运动的动态策略梯度优化。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7686-7695. doi: 10.1109/TPAMI.2022.3223407. Epub 2023 May 5.
2
A parallel heterogeneous policy deep reinforcement learning algorithm for bipedal walking motion design.一种用于双足步行运动设计的并行异构策略深度强化学习算法。
Front Neurorobot. 2023 Aug 8;17:1205775. doi: 10.3389/fnbot.2023.1205775. eCollection 2023.
3
Multimodal bipedal locomotion generation with passive dynamics deep reinforcement learning.基于被动动力学深度强化学习的多模态双足运动生成
Front Neurorobot. 2023 Jan 23;16:1054239. doi: 10.3389/fnbot.2022.1054239. eCollection 2022.
4
A Multi-Agent Reinforcement Learning Method for Omnidirectional Walking of Bipedal Robots.一种用于双足机器人全向行走的多智能体强化学习方法。
Biomimetics (Basel). 2023 Dec 16;8(8):616. doi: 10.3390/biomimetics8080616.
5
Adaptive Gait Acquisition through Learning Dynamic Stimulus Instinct of Bipedal Robot.通过学习双足机器人的动态刺激本能实现自适应步态获取
Biomimetics (Basel). 2024 May 22;9(6):310. doi: 10.3390/biomimetics9060310.
6
Biped Robots Control in Gusty Environments with Adaptive Exploration Based DDPG.基于自适应探索深度确定性策略梯度算法的阵风环境下双足机器人控制
Biomimetics (Basel). 2024 Jun 8;9(6):346. doi: 10.3390/biomimetics9060346.
7
Intelligent control of self-driving vehicles based on adaptive sampling supervised actor-critic and human driving experience.基于自适应采样监督式智能体-评论家算法和人类驾驶经验的自动驾驶车辆智能控制
Math Biosci Eng. 2024 May 24;21(5):6077-6096. doi: 10.3934/mbe.2024267.
8
Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network.基于辅助评论家网络的自动驾驶策略深度强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3680-3690. doi: 10.1109/TNNLS.2021.3116063. Epub 2023 Jul 6.
9
Hybrid Bipedal Locomotion Based on Reinforcement Learning and Heuristics.基于强化学习和启发式算法的混合双足运动
Micromachines (Basel). 2022 Oct 7;13(10):1688. doi: 10.3390/mi13101688.
10
Deep Deterministic Policy Gradient-Based Autonomous Driving for Mobile Robots in Sparse Reward Environments.基于深度确定性策略梯度的稀疏奖励环境下移动机器人自主驾驶。
Sensors (Basel). 2022 Dec 7;22(24):9574. doi: 10.3390/s22249574.