Hu Jing, Fan Chuang, Wang Zhoupu, Ruan Jinglin, Wu Suyin
School of Mathematics and Computer Science, Wuhan Polytechnic University, Wuhan 430024, China.
Sensors (Basel). 2023 Jun 25;23(13):5903. doi: 10.3390/s23135903.
With the increasing popularity of online fruit sales, accurately predicting fruit yields has become crucial for optimizing logistics and storage strategies. However, existing manual vision-based systems and sensor methods have proven inadequate for solving the complex problem of fruit yield counting, as they struggle with issues such as crop overlap and variable lighting conditions. Recently CNN-based object detection models have emerged as a promising solution in the field of computer vision, but their effectiveness is limited in agricultural scenarios due to challenges such as occlusion and dissimilarity among the same fruits. To address this issue, we propose a novel variant model that combines the self-attentive mechanism of Vision Transform, a non-CNN network architecture, with Yolov7, a state-of-the-art object detection model. Our model utilizes two attention mechanisms, CBAM and CA, and is trained and tested on a dataset of apple images. In order to enable fruit counting across video frames in complex environments, we incorporate two multi-objective tracking methods based on Kalman filtering and motion trajectory prediction, namely SORT, and Cascade-SORT. Our results show that the Yolov7-CA model achieved a 91.3% mAP and 0.85 F1 score, representing a 4% improvement in mAP and 0.02 improvement in F1 score compared to using Yolov7 alone. Furthermore, three multi-object tracking methods demonstrated a significant improvement in MAE for inter-frame counting across all three test videos, with an 0.642 improvement over using yolov7 alone achieved using our multi-object tracking method. These findings suggest that our proposed model has the potential to improve fruit yield assessment methods and could have implications for decision-making in the fruit industry.
随着在线水果销售的日益普及,准确预测水果产量对于优化物流和存储策略变得至关重要。然而,现有的基于人工视觉的系统和传感器方法已被证明不足以解决水果产量计数这一复杂问题,因为它们在诸如作物重叠和光照条件变化等问题上存在困难。最近,基于卷积神经网络(CNN)的目标检测模型在计算机视觉领域成为一种有前景的解决方案,但由于诸如遮挡和同一水果之间的差异等挑战,其在农业场景中的有效性受到限制。为了解决这个问题,我们提出了一种新颖的变体模型,该模型将非CNN网络架构视觉变换器的自注意力机制与最先进的目标检测模型Yolov7相结合。我们的模型利用了两种注意力机制,即卷积块注意力模块(CBAM)和通道注意力(CA),并在苹果图像数据集上进行训练和测试。为了能够在复杂环境中跨视频帧进行水果计数,我们纳入了基于卡尔曼滤波和运动轨迹预测的两种多目标跟踪方法,即简单在线实时跟踪(SORT)和级联SORT。我们的结果表明,Yolov7-CA模型实现了91.3%的平均精度均值(mAP)和0.85的F1分数,与单独使用Yolov7相比,mAP提高了4%,F1分数提高了0.02。此外,三种多目标跟踪方法在所有三个测试视频的帧间计数平均绝对误差(MAE)方面都有显著改善,使用我们的多目标跟踪方法比单独使用yolov7提高了0.