• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于自优化重放机制的动态环境下机器人手臂轨迹规划

Robotic Arm Trajectory Planning in Dynamic Environments Based on Self-Optimizing Replay Mechanism.

作者信息

Xu Pengyao, Di Chong, Lv Jiandong, Zhao Peng, Chen Chao, Wang Ruotong

机构信息

Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China.

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China.

出版信息

Sensors (Basel). 2025 Jul 29;25(15):4681. doi: 10.3390/s25154681.

DOI:10.3390/s25154681
PMID:40807846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12349172/
Abstract

In complex dynamic environments, robotic arms face multiple challenges such as real-time environmental changes, high-dimensional state spaces, and strong uncertainties. Trajectory planning tasks based on deep reinforcement learning (DRL) suffer from difficulties in acquiring human expert strategies, low experience utilization (leading to slow convergence), and unreasonable reward function design. To address these issues, this paper designs a neural network-based expert-guided triple experience replay mechanism (NETM) and proposes an improved reward function adapted to dynamic environments. This replay mechanism integrates imitation learning's fast data fitting with DRL's self-optimization to expand limited expert demonstrations and algorithm-generated successes into optimized expert experiences. Experimental results show the expanded expert experience accelerates convergence: in dynamic scenarios, NETM boosts accuracy by over 30% and safe rate by 2.28% compared to baseline algorithms.

摘要

在复杂的动态环境中,机器人手臂面临着诸如实时环境变化、高维状态空间和强不确定性等多重挑战。基于深度强化学习(DRL)的轨迹规划任务在获取人类专家策略、低经验利用率(导致收敛缓慢)以及不合理的奖励函数设计方面存在困难。为了解决这些问题,本文设计了一种基于神经网络的专家引导式三重经验回放机制(NETM),并提出了一种适用于动态环境的改进奖励函数。这种回放机制将模仿学习的快速数据拟合与DRL的自我优化相结合,将有限的专家示范和算法生成的成功经验扩展为优化的专家经验。实验结果表明,扩展后的专家经验加速了收敛:在动态场景中,与基线算法相比,NETM将准确率提高了30%以上,安全率提高了2.28%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/9094bd108aea/sensors-25-04681-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/92ae49b4b8dc/sensors-25-04681-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/c6267b34e699/sensors-25-04681-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/bbb3262e34f4/sensors-25-04681-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/453891fa15d3/sensors-25-04681-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/6bf01aec380b/sensors-25-04681-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/27d8afee8dcd/sensors-25-04681-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/910825111bc4/sensors-25-04681-g007a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/d78e27c87583/sensors-25-04681-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/62fb50871d3b/sensors-25-04681-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/9094bd108aea/sensors-25-04681-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/92ae49b4b8dc/sensors-25-04681-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/c6267b34e699/sensors-25-04681-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/bbb3262e34f4/sensors-25-04681-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/453891fa15d3/sensors-25-04681-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/6bf01aec380b/sensors-25-04681-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/27d8afee8dcd/sensors-25-04681-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/910825111bc4/sensors-25-04681-g007a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/d78e27c87583/sensors-25-04681-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/62fb50871d3b/sensors-25-04681-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b063/12349172/9094bd108aea/sensors-25-04681-g010.jpg

相似文献

1
Robotic Arm Trajectory Planning in Dynamic Environments Based on Self-Optimizing Replay Mechanism.基于自优化重放机制的动态环境下机器人手臂轨迹规划
Sensors (Basel). 2025 Jul 29;25(15):4681. doi: 10.3390/s25154681.
2
Research on AGV Path Planning Based on Improved DQN Algorithm.基于改进深度Q网络算法的AGV路径规划研究
Sensors (Basel). 2025 Jul 29;25(15):4685. doi: 10.3390/s25154685.
3
Shapley value-driven multi-modal deep reinforcement learning for complex decision-making.用于复杂决策的沙普利值驱动多模态深度强化学习
Neural Netw. 2025 Nov;191:107650. doi: 10.1016/j.neunet.2025.107650. Epub 2025 Jun 21.
4
Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy.基于经验回放的演员-评论家算法用于前列腺癌调强放射治疗的自动治疗计划
Med Phys. 2025 Jul;52(7):e17915. doi: 10.1002/mp.17915. Epub 2025 May 31.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
Short-Term Memory Impairment短期记忆障碍
7
Design of a dynamic trust management and defense decision system for shared vehicle data based on blockchain and deep reinforcement learning.基于区块链和深度强化学习的共享车辆数据动态信任管理与防御决策系统设计
Sci Rep. 2025 Jul 22;15(1):26662. doi: 10.1038/s41598-025-11511-y.
8
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验:对定性文献的系统综述
JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.
9
Representation-driven sampling and adaptive policy resetting for improving multi-Agent reinforcement learning.用于改进多智能体强化学习的表征驱动采样与自适应策略重置
Neural Netw. 2025 Jul 15;192:107875. doi: 10.1016/j.neunet.2025.107875.
10
2HR-Net VSLAM: Robust visual SLAM based on dual high-reliability feature matching in dynamic environments.2HR-Net视觉同步定位与地图构建:基于动态环境中双高可靠性特征匹配的稳健视觉同步定位与地图构建。
PLoS One. 2025 Jul 18;20(7):e0328052. doi: 10.1371/journal.pone.0328052. eCollection 2025.

引用本文的文献

1
Comparative Benchmark of Sampling-Based and DRL Motion Planning Methods for Industrial Robotic Arms.工业机器人手臂基于采样和深度强化学习运动规划方法的比较基准
Sensors (Basel). 2025 Aug 25;25(17):5282. doi: 10.3390/s25175282.

本文引用的文献

1
A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges.模仿学习综述:算法、最新进展与挑战
IEEE Trans Cybern. 2024 Dec;54(12):7173-7186. doi: 10.1109/TCYB.2024.3395626. Epub 2024 Nov 27.
2
Prioritized experience replay in path planning via multi-dimensional transition priority fusion.通过多维度转移优先级融合在路径规划中进行优先经验回放。
Front Neurorobot. 2023 Nov 15;17:1281166. doi: 10.3389/fnbot.2023.1281166. eCollection 2023.
3
A Survey on Deep Reinforcement Learning Algorithms for Robotic Manipulation.
机器人操作的深度强化学习算法研究综述。
Sensors (Basel). 2023 Apr 5;23(7):3762. doi: 10.3390/s23073762.
4
Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment.复杂环境下基于深度强化学习的双臂机器人轨迹规划
Micromachines (Basel). 2022 Mar 31;13(4):564. doi: 10.3390/mi13040564.