文献检索，用中文搜 PubMed

Strawberry grading by picking robots can eliminate the manual classification, reducing labor costs and minimizing the damage to the fruit. Strawberry size or weight is a key factor in grading, with accurate weight estimation being crucial for proper classification. In this paper, we collected 1521 sets of strawberry RGB-D images using a depth camera and manually measured the weight and size of the strawberries to construct a training dataset for the strawberry weight regression model. To address the issue of incomplete depth images caused by environmental interference with depth cameras, this study proposes a multimodal point cloud completion method specifically designed for symmetrical objects, leveraging RGB images to guide the completion of depth images in the same scene. The method follows a process of locating strawberry pixel regions, calculating centroid coordinates, determining the symmetry axis via PCA, and completing the depth image. Based on this approach, a multimodal fusion regression model for strawberry weight estimation, named MMF-Net, is developed. The model uses the completed point cloud and RGB image as inputs, and extracts features from the RGB image and point cloud by EfficientNet and PointNet, respectively. These features are then integrated at the feature level through gradient blending, realizing the combination of the strengths of both modalities. Using the Percent Correct Weight (PCW) metric as the evaluation standard, this study compares the performance of four traditional machine learning methods, Support Vector Regression (SVR), Multilayer Perceptron (MLP), Linear Regression, and Random Forest Regression, with four point cloud-based deep learning models, PointNet, PointNet++, PointMLP, and Point Cloud Transformer, as well as an image-based deep learning model, EfficientNet and ResNet, on single-modal datasets. The results indicate that among traditional machine learning methods, the SVR model achieved the best performance with an accuracy of 77.7% (PCW@0.2). Among deep learning methods, the image-based EfficientNet model obtained the highest accuracy, reaching 85% (PCW@0.2), while the PointNet + + model demonstrated the best performance among point cloud-based models, with an accuracy of 54.3% (PCW@0.2). The proposed multimodal fusion model, MMF-Net, achieved an accuracy of 87.66% (PCW@0.2), significantly outperforming both traditional machine learning methods and single-modal deep learning models in terms of precision.

通过采摘机器人对草莓进行分级可以消除人工分类，降低劳动力成本，并将对果实的损伤降至最低。草莓的大小或重量是分级的关键因素，准确的重量估计对于正确分类至关重要。在本文中，我们使用深度相机收集了1521组草莓RGB-D图像，并手动测量了草莓的重量和大小，以构建草莓重量回归模型的训练数据集。为了解决深度相机受环境干扰导致深度图像不完整的问题，本研究提出了一种专门针对对称物体的多模态点云补全方法，利用RGB图像来指导同一场景中深度图像的补全。该方法遵循定位草莓像素区域、计算质心坐标、通过主成分分析确定对称轴以及补全深度图像的过程。基于此方法，开发了一种用于草莓重量估计的多模态融合回归模型，名为MMF-Net。该模型将补全后的点云和RGB图像作为输入，分别通过EfficientNet和PointNet从RGB图像和点云中提取特征。然后，这些特征通过梯度融合在特征层面进行整合，实现了两种模态优势的结合。以正确重量百分比（PCW）指标作为评估标准，本研究在单模态数据集上比较了四种传统机器学习方法（支持向量回归（SVR）、多层感知器（MLP）、线性回归和随机森林回归）、四种基于点云的深度学习模型（PointNet、PointNet++、PointMLP和点云变换器）以及一种基于图像的深度学习模型（EfficientNet和ResNet）的性能。结果表明，在传统机器学习方法中，SVR模型表现最佳，准确率为77.7%（PCW@0.2）。在深度学习方法中，基于图像的EfficientNet模型准确率最高，达到85%（PCW@0.2），而PointNet++模型在基于点云的模型中表现最佳，准确率为54.3%（PCW@0.2）。所提出的多模态融合模型MMF-Net的准确率为87.66%（PCW@0.2），在精度方面显著优于传统机器学习方法和单模态深度学习模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用点云补全和多模态融合网络的采摘机器人对草莓重量进行分级估计。

Estimating strawberry weight for grading by picking robot with point cloud completion and multimodal fusion network.

作者信息

机构信息

出版信息

相似文献

本文引用的文献