Suppr超能文献

基于深度强化学习的不确定约束下轨迹规划

Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints.

作者信息

Chen Lienhung, Jiang Zhongliang, Cheng Long, Knoll Alois C, Zhou Mingchuan

机构信息

Department of Computer Science, Technische Universität München, Munich, Germany.

College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, China.

出版信息

Front Neurorobot. 2022 May 2;16:883562. doi: 10.3389/fnbot.2022.883562. eCollection 2022.

Abstract

With the advance in algorithms, deep reinforcement learning (DRL) offers solutions to trajectory planning under uncertain environments. Different from traditional trajectory planning which requires lots of effort to tackle complicated high-dimensional problems, the recently proposed DRL enables the robot manipulator to autonomously learn and discover optimal trajectory planning by interacting with the environment. In this article, we present state-of-the-art DRL-based collision-avoidance trajectory planning for uncertain environments such as a safe human coexistent environment. Since the robot manipulator operates in high dimensional continuous state-action spaces, model-free, policy gradient-based soft actor-critic (SAC), and deep deterministic policy gradient (DDPG) framework are adapted to our scenario for comparison. In order to assess our proposal, we simulate a 7-DOF Panda (Franka Emika) robot manipulator in the PyBullet physics engine and then evaluate its trajectory planning with reward, loss, safe rate, and accuracy. Finally, our final report shows the effectiveness of state-of-the-art DRL algorithms for trajectory planning under uncertain environments with zero collision after 5,000 episodes of training.

摘要

随着算法的进步,深度强化学习(DRL)为不确定环境下的轨迹规划提供了解决方案。与传统轨迹规划不同,传统轨迹规划需要付出大量努力来解决复杂的高维问题,而最近提出的DRL使机器人操纵器能够通过与环境交互自主学习并发现最优轨迹规划。在本文中,我们提出了基于DRL的最新技术,用于在诸如安全的人类共存环境等不确定环境中进行避碰轨迹规划。由于机器人操纵器在高维连续状态-动作空间中运行,因此将无模型、基于策略梯度的软演员-评论家(SAC)和深度确定性策略梯度(DDPG)框架应用于我们的场景进行比较。为了评估我们的提议,我们在PyBullet物理引擎中模拟了一个7自由度的熊猫(弗兰克·埃米卡)机器人操纵器,然后用奖励、损失、安全率和准确性来评估其轨迹规划。最后,我们的最终报告显示了最新的DRL算法在不确定环境下进行轨迹规划的有效性,经过5000次训练后实现了零碰撞。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357a/9108367/c2e5f82552c1/fnbot-16-883562-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验