使用 3D 相机和深度学习算法识别和计数动态状态下的苹果，用于机器人采摘系统。

Recognition and Counting of Apples in a Dynamic State Using a 3D Camera and Deep Learning Algorithms for Robotic Harvesting Systems.

机构信息

Graduate School of Science and Technology, University of Tsukuba, Tennodai 1-1-1, Tsukuba 305-8577, Japan.

Department of Agricultural Engineering, University of Peradeniya, Kandy 20400, Sri Lanka.

出版信息

Sensors (Basel). 2023 Apr 7;23(8):3810. doi: 10.3390/s23083810.

DOI:10.3390/s23083810

PMID:37112151

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10145955/

Abstract

Recognition and 3D positional estimation of apples during harvesting from a robotic platform in a moving vehicle are still challenging. Fruit clusters, branches, foliage, low resolution, and different illuminations are unavoidable and cause errors in different environmental conditions. Therefore, this research aimed to develop a recognition system based on training datasets from an augmented, complex apple orchard. The recognition system was evaluated using deep learning algorithms established from a convolutional neural network (CNN). The dynamic accuracy of the modern artificial neural networks involving 3D coordinates for deploying robotic arms at different forward-moving speeds from an experimental vehicle was investigated to compare the recognition and tracking localization accuracy. In this study, a Realsense D455 RGB-D camera was selected to acquire 3D coordinates of each detected and counted apple attached to artificial trees placed in the field to propose a specially designed structure for ease of robotic harvesting. A 3D camera, YOLO (You Only Look Once), YOLOv4, YOLOv5, YOLOv7, and EfficienDet state-of-the-art models were utilized for object detection. The Deep SORT algorithm was employed for tracking and counting detected apples using perpendicular, 15°, and 30° orientations. The 3D coordinates were obtained for each tracked apple when the on-board camera in the vehicle passed the reference line and was set in the middle of the image frame. To optimize harvesting at three different speeds (0.052 ms, 0.069 ms, and 0.098 ms), the accuracy of 3D coordinates was compared for three forward-moving speeds and three camera angles (15°, 30°, and 90°). The mean average precision (mAP@0.5) values of YOLOv4, YOLOv5, YOLOv7, and EfficientDet were 0.84, 0.86, 0.905, and 0.775, respectively. The lowest root mean square error (RMSE) was 1.54 cm for the apples detected by EfficientDet at a 15° orientation and a speed of 0.098 ms. In terms of counting apples, YOLOv5 and YOLOv7 showed a higher number of detections in outdoor dynamic conditions, achieving a counting accuracy of 86.6%. We concluded that the EfficientDet deep learning algorithm at a 15° orientation in 3D coordinates can be employed for further robotic arm development while harvesting apples in a specially designed orchard.

摘要

在移动车辆上的机器人平台上进行采摘时，识别和三维定位苹果仍然具有挑战性。水果簇、树枝、树叶、低分辨率和不同的光照条件是不可避免的，并且会在不同的环境条件下造成误差。因此，本研究旨在开发一种基于增强复杂苹果果园训练数据集的识别系统。该识别系统是使用从卷积神经网络（CNN）建立的深度学习算法进行评估的。为了比较识别和跟踪定位精度，研究了现代人工智能神经网络的动态精度，该神经网络涉及在不同前进速度下部署机械臂的三维坐标，来自实验车辆。在这项研究中，选择了 RealSense D455 RGB-D 相机来获取附着在放置在田间的人工树上的每个检测和计数的苹果的 3D 坐标，以提出一种特别设计的结构，便于机器人采摘。使用 3D 相机、YOLO（只看一次）、YOLOv4、YOLOv5、YOLOv7 和 EfficienDet 最先进的模型进行目标检测。使用 Deep SORT 算法根据垂直、15°和 30°方向对检测到的苹果进行跟踪和计数。当车辆上的车载摄像头通过参考线并设置在图像帧中间时，为每个跟踪的苹果获取 3D 坐标。为了在三种不同速度（0.052 ms、0.069 ms 和 0.098 ms）下进行优化采摘，比较了三种前进速度和三种相机角度（15°、30°和 90°）下 3D 坐标的精度。YOLOv4、YOLOv5、YOLOv7 和 EfficientDet 的平均精度（mAP@0.5）值分别为 0.84、0.86、0.905 和 0.775。在 15°方向和 0.098 ms 的速度下，EfficientDet 检测到的苹果的均方根误差（RMSE）最低为 1.54 cm。在计数苹果方面，YOLOv5 和 YOLOv7 在户外动态条件下检测到的苹果数量更多，达到 86.6%的计数准确率。我们得出的结论是，在专门设计的果园中进行采摘时，可使用 3D 坐标下的 EfficientDet 深度学习算法进一步开发机械臂。