Xu Pengyao, Di Chong, Lv Jiandong, Zhao Peng, Chen Chao, Wang Ruotong
Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China.
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China.
Sensors (Basel). 2025 Jul 29;25(15):4681. doi: 10.3390/s25154681.
In complex dynamic environments, robotic arms face multiple challenges such as real-time environmental changes, high-dimensional state spaces, and strong uncertainties. Trajectory planning tasks based on deep reinforcement learning (DRL) suffer from difficulties in acquiring human expert strategies, low experience utilization (leading to slow convergence), and unreasonable reward function design. To address these issues, this paper designs a neural network-based expert-guided triple experience replay mechanism (NETM) and proposes an improved reward function adapted to dynamic environments. This replay mechanism integrates imitation learning's fast data fitting with DRL's self-optimization to expand limited expert demonstrations and algorithm-generated successes into optimized expert experiences. Experimental results show the expanded expert experience accelerates convergence: in dynamic scenarios, NETM boosts accuracy by over 30% and safe rate by 2.28% compared to baseline algorithms.
在复杂的动态环境中,机器人手臂面临着诸如实时环境变化、高维状态空间和强不确定性等多重挑战。基于深度强化学习(DRL)的轨迹规划任务在获取人类专家策略、低经验利用率(导致收敛缓慢)以及不合理的奖励函数设计方面存在困难。为了解决这些问题,本文设计了一种基于神经网络的专家引导式三重经验回放机制(NETM),并提出了一种适用于动态环境的改进奖励函数。这种回放机制将模仿学习的快速数据拟合与DRL的自我优化相结合,将有限的专家示范和算法生成的成功经验扩展为优化的专家经验。实验结果表明,扩展后的专家经验加速了收敛:在动态场景中,与基线算法相比,NETM将准确率提高了30%以上,安全率提高了2.28%。