• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用深度强化学习的多臂机械臂路径规划:带有后见之明经验回放的软动作-评论家。

Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor-Critic with Hindsight Experience Replay.

机构信息

Department of Electrical and Information Engineering, Research Center for Electrical and Information Technology, Seoul National University of Science and Technology, Seoul 01811, Korea.

Applied Robot R&D Department, Korea Institute of Industrial Technology (KITECH), Ansan 15588, Korea.

出版信息

Sensors (Basel). 2020 Oct 19;20(20):5911. doi: 10.3390/s20205911.

DOI:10.3390/s20205911
PMID:33086774
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7590214/
Abstract

Since path planning for multi-arm manipulators is a complicated high-dimensional problem, effective and fast path generation is not easy for the arbitrarily given start and goal locations of the end effector. Especially, when it comes to deep reinforcement learning-based path planning, high-dimensionality makes it difficult for existing reinforcement learning-based methods to have efficient exploration which is crucial for successful training. The recently proposed soft actor-critic (SAC) is well known to have good exploration ability due to the use of the entropy term in the objective function. Motivated by this, in this paper, a SAC-based path planning algorithm is proposed. The hindsight experience replay (HER) is also employed for sample efficiency and configuration space augmentation is used in order to deal with complicated configuration space of the multi-arms. To show the effectiveness of the proposed algorithm, both simulation and experiment results are given. By comparing with existing results, it is demonstrated that the proposed method outperforms the existing results.

摘要

由于多臂机械手的路径规划是一个复杂的高维问题,对于任意给定的末端执行器的起始和目标位置,有效和快速的路径生成并不容易。特别是在基于深度强化学习的路径规划中,高维性使得现有的基于强化学习的方法难以进行有效的探索,而探索对于成功的训练至关重要。最近提出的软动作-评论家(Soft Actor-Critic,SAC)由于在目标函数中使用了熵项,因此被公认为具有良好的探索能力。受此启发,本文提出了一种基于 SAC 的路径规划算法。还采用了后见之明经验回放(Hindsight Experience Replay,HER)来提高样本效率,并使用配置空间扩充来处理多臂复杂的配置空间。为了展示所提出算法的有效性,给出了仿真和实验结果。通过与现有结果的比较,证明了所提出的方法优于现有结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/c2433a4d9654/sensors-20-05911-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/6bf5920fdf97/sensors-20-05911-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/6b63abb7adc1/sensors-20-05911-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/d5205da971f4/sensors-20-05911-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/68c6c5d5d0d9/sensors-20-05911-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/22fa71349b32/sensors-20-05911-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/1f7d9aea362f/sensors-20-05911-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/4608b54197d5/sensors-20-05911-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/c2433a4d9654/sensors-20-05911-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/6bf5920fdf97/sensors-20-05911-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/6b63abb7adc1/sensors-20-05911-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/d5205da971f4/sensors-20-05911-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/68c6c5d5d0d9/sensors-20-05911-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/22fa71349b32/sensors-20-05911-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/1f7d9aea362f/sensors-20-05911-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/4608b54197d5/sensors-20-05911-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6414/7590214/c2433a4d9654/sensors-20-05911-g007.jpg

相似文献

1
Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor-Critic with Hindsight Experience Replay.使用深度强化学习的多臂机械臂路径规划:带有后见之明经验回放的软动作-评论家。
Sensors (Basel). 2020 Oct 19;20(20):5911. doi: 10.3390/s20205911.
2
A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots.一种基于改进软演员-评论家算法的移动机器人路径规划方法。
Biomimetics (Basel). 2023 Oct 10;8(6):481. doi: 10.3390/biomimetics8060481.
3
End-to-End AUV Motion Planning Method Based on Soft Actor-Critic.基于软动作 - 批评家的端到端 AUV 运动规划方法。
Sensors (Basel). 2021 Sep 1;21(17):5893. doi: 10.3390/s21175893.
4
Heuristic Q-learning based on experience replay for three-dimensional path planning of the unmanned aerial vehicle.基于经验回放的启发式Q学习在无人机三维路径规划中的应用
Sci Prog. 2020 Jan-Mar;103(1):36850419879024. doi: 10.1177/0036850419879024. Epub 2019 Sep 30.
5
The Intelligent Path Planning System of Agricultural Robot via Reinforcement Learning.农业机器人的强化学习智能路径规划系统。
Sensors (Basel). 2022 Jun 7;22(12):4316. doi: 10.3390/s22124316.
6
A priority experience replay actor-critic algorithm using self-attention mechanism for strategy optimization of discrete problems.一种使用自注意力机制的优先经验回放演员-评论家算法,用于离散问题的策略优化。
PeerJ Comput Sci. 2024 Jun 28;10:e2161. doi: 10.7717/peerj-cs.2161. eCollection 2024.
7
Reinforcement learning-based dynamic obstacle avoidance and integration of path planning.基于强化学习的动态避障与路径规划集成
Intell Serv Robot. 2021;14(5):663-677. doi: 10.1007/s11370-021-00387-2. Epub 2021 Oct 6.
8
Real-time route planning of unmanned aerial vehicles based on improved soft actor-critic algorithm.基于改进软演员-评论家算法的无人机实时路径规划
Front Neurorobot. 2022 Dec 5;16:1025817. doi: 10.3389/fnbot.2022.1025817. eCollection 2022.
9
Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences.改进的软演员-评论家算法:将优先离策略样本与在线策略经验相结合。
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3121-3129. doi: 10.1109/TNNLS.2022.3174051. Epub 2024 Feb 29.
10
Path Planning of a Mobile Robot for a Dynamic Indoor Environment Based on an SAC-LSTM Algorithm.基于SAC-LSTM算法的动态室内环境移动机器人路径规划
Sensors (Basel). 2023 Dec 13;23(24):9802. doi: 10.3390/s23249802.

引用本文的文献

1
An improved joint space Astar algorithm for a 6-DOF manipulator with pre-planning strategy.一种具有预规划策略的六自由度机械手的改进关节空间A星算法。
Sci Rep. 2025 May 25;15(1):18164. doi: 10.1038/s41598-025-01010-5.
2
Compliant Motion Planning Integrating Human Skill for Robotic Arm Collecting Tomato Bunch Based on Improved DDPG.基于改进深度确定性策略梯度算法的、集成人类技能的机器人手臂采摘番茄串柔顺运动规划
Plants (Basel). 2025 Feb 20;14(5):634. doi: 10.3390/plants14050634.
3
Autonomous Driving of Mobile Robots in Dynamic Environments Based on Deep Deterministic Policy Gradient: Reward Shaping and Hindsight Experience Replay.

本文引用的文献

1
A Multitasking-Oriented Robot Arm Motion Planning Scheme Based on Deep Reinforcement Learning and Twin Synchro-Control.基于深度强化学习和双同步控制的面向多任务的机械臂运动规划方案。
Sensors (Basel). 2020 Jun 21;20(12):3515. doi: 10.3390/s20123515.
2
Learning Mobile Manipulation through Deep Reinforcement Learning.通过深度强化学习学习移动操作。
Sensors (Basel). 2020 Feb 10;20(3):939. doi: 10.3390/s20030939.
3
An Autonomous Path Planning Model for Unmanned Ships Based on Deep Reinforcement Learning.基于深度强化学习的无人船自主路径规划模型。
基于深度确定性策略梯度的动态环境中移动机器人自主驾驶:奖励塑造与事后经验回放
Biomimetics (Basel). 2024 Jan 13;9(1):0. doi: 10.3390/biomimetics9010051.
4
Path Planning for Unmanned Surface Vehicles with Strong Generalization Ability Based on Improved Proximal Policy Optimization.基于改进近端策略优化的具有强泛化能力的无人水面艇路径规划
Sensors (Basel). 2023 Oct 31;23(21):8864. doi: 10.3390/s23218864.
5
Improved Robot Path Planning Method Based on Deep Reinforcement Learning.基于深度强化学习的改进型机器人路径规划方法。
Sensors (Basel). 2023 Jun 15;23(12):5622. doi: 10.3390/s23125622.
6
Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey.多智能体深度强化学习在多机器人应用中的研究综述
Sensors (Basel). 2023 Mar 30;23(7):3625. doi: 10.3390/s23073625.
7
A Self-Collision Detection Algorithm of a Dual-Manipulator System Based on GJK and Deep Learning.基于 GJK 和深度学习的双机械臂系统自碰撞检测算法。
Sensors (Basel). 2023 Jan 3;23(1):523. doi: 10.3390/s23010523.
8
Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty.具有不确定性的持续任务中用于深度强化学习的自适应折扣因子。
Sensors (Basel). 2022 Sep 25;22(19):7266. doi: 10.3390/s22197266.
9
Medical Image Segmentation Algorithm for Three-Dimensional Multimodal Using Deep Reinforcement Learning and Big Data Analytics.基于深度强化学习和大数据分析的三维多模态医学图像分割算法。
Front Public Health. 2022 Apr 8;10:879639. doi: 10.3389/fpubh.2022.879639. eCollection 2022.
10
Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment.复杂环境下基于深度强化学习的双臂机器人轨迹规划
Micromachines (Basel). 2022 Mar 31;13(4):564. doi: 10.3390/mi13040564.
Sensors (Basel). 2020 Jan 11;20(2):426. doi: 10.3390/s20020426.
4
Implementation of a Potential Field-Based Decision-Making Algorithm on Autonomous Vehicles for Driving in Complex Environments.一种基于势场的决策算法在复杂环境中自动驾驶车辆上的实现
Sensors (Basel). 2019 Jul 28;19(15):3318. doi: 10.3390/s19153318.
5
Fast Marching Tree: a Fast Marching Sampling-Based Method for Optimal Motion Planning in Many Dimensions.快速行进树:一种基于快速行进采样的多维最优运动规划方法。
Int J Rob Res. 2015 Jun;34(7):883-921. doi: 10.1177/0278364915577958. Epub 2015 May 18.
6
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.