基于深度强化学习的温室双臂番茄采摘机器人间歇式启停运动规划

Intermittent Stop-Move Motion Planning for Dual-Arm Tomato Harvesting Robot in Greenhouse Based on Deep Reinforcement Learning.

作者信息

Li Yajun, Feng Qingchun, Zhang Yifan, Peng Chuanlang, Zhao Chunjiang

机构信息

College of Mechanical and Electrical Engineering, Hunan Agriculture University, Changsha 410128, China.

Intelligent Equipment Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China.

出版信息

Biomimetics (Basel). 2024 Feb 10;9(2):105. doi: 10.3390/biomimetics9020105.

DOI:10.3390/biomimetics9020105

PMID:38392151

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10886618/

Abstract

Intermittent stop-move motion planning is essential for optimizing the efficiency of harvesting robots in greenhouse settings. Addressing issues like frequent stops, missed targets, and uneven task allocation, this study introduced a novel intermittent motion planning model using deep reinforcement learning for a dual-arm harvesting robot vehicle. Initially, the model gathered real-time coordinate data of target fruits on both sides of the robot, and projected these coordinates onto a two-dimensional map. Subsequently, the DDPG (Deep Deterministic Policy Gradient) algorithm was employed to generate parking node sequences for the robotic vehicle. A dynamic simulation environment, designed to mimic industrial greenhouse conditions, was developed to enhance the DDPG to generalize to real-world scenarios. Simulation results have indicated that the convergence performance of the DDPG model was improved by 19.82% and 33.66% compared to the SAC and TD3 models, respectively. In tomato greenhouse experiments, the model reduced vehicle parking frequency by 46.5% and 36.1% and decreased arm idleness by 42.9% and 33.9%, compared to grid-based and area division algorithms, without missing any targets. The average time required to generate planned paths was 6.9 ms. These findings demonstrate that the parking planning method proposed in this paper can effectively improve the overall harvesting efficiency and allocate tasks for a dual-arm harvesting robot in a more rational manner.

摘要

间歇式启停运动规划对于优化温室环境下采摘机器人的效率至关重要。针对频繁停车、目标遗漏和任务分配不均等问题，本研究为双臂采摘机器人车辆引入了一种基于深度强化学习的新型间歇运动规划模型。该模型首先收集机器人两侧目标果实的实时坐标数据，并将这些坐标投影到二维地图上。随后，采用深度确定性策略梯度（DDPG）算法生成机器人车辆的停车节点序列。为增强DDPG模型对现实场景的泛化能力，开发了一个模拟工业温室条件的动态仿真环境。仿真结果表明，与SAC和TD3模型相比，DDPG模型的收敛性能分别提高了19.82%和33.66%。在番茄温室实验中，与基于网格和区域划分算法相比，该模型将车辆停车频率降低了46.5%和36.1%，并将机械臂闲置率降低了42.9%和33.9%，且未遗漏任何目标。生成规划路径所需的平均时间为6.9毫秒。这些结果表明，本文提出的停车规划方法能够有效提高整体采摘效率，并以更合理的方式为双臂采摘机器人分配任务。