文献检索，用中文搜 PubMed

INTRODUCTION

Yunnan Xiaomila is a pepper variety whose flowers and fruits become mature at the same time and multiple times a year. The distinction between the fruits and the background is low and the background is complex. The targets are small and difficult to identify.

METHODS

This paper aims at the problem of target detection of Yunnan Xiaomila under complex background environment, in order to reduce the impact caused by the small color gradient changes between xiaomila and background and the unclear feature information, an improved PAE-YOLO model is proposed, which combines the EMA attention mechanism and DCNv3 deformable convolution is integrated into the YOLOv8 model, which improves the model's feature extraction capability and inference speed for Xiaomila in complex environments, and achieves a lightweight model. First, the EMA attention mechanism is combined with the C2f module in the YOLOv8 network. The C2f module can well extract local features from the input image, and the EMA attention mechanism can control the global relationship. The two complement each other, thereby enhancing the model's expression ability; Meanwhile, in the backbone network and head network, the DCNv3 convolution module is introduced, which can adaptively adjust the sampling position according to the input feature map, contributing to stronger feature capture capabilities for targets of different scales and a lightweight network. It also uses a depth camera to estimate the posture of Xiaomila, while analyzing and optimizing different occlusion situations. The effectiveness of the proposed method was verified through ablation experiments, model comparison experiments and attitude estimation experiments.

RESULTS

The experimental results indicated that the model obtained an average mean accuracy (mAP) of 88.8%, which was 1.3% higher than that of the original model. Its F1 score reached 83.2, and the GFLOPs and model sizes were 7.6G and 5.7MB respectively. The F1 score ranked the best among several networks, with the model weight and gigabit floating-point operations per second (GFLOPs) being the smallest, which are 6.2% and 8.1% lower than the original model. The loss value was the lowest during training, and the convergence speed was the fastest. Meanwhile, the attitude estimation results of 102 targets showed that the orientation was correctly estimated exceed 85% of the cases, and the average error angle was 15.91°. In the occlusion condition, 86.3% of the attitude estimation error angles were less than 40°, and the average error angle was 23.19°.

DISCUSSION

The results show that the improved detection model can accurately identify Xiaomila targets fruits, has higher model accuracy, less computational complexity, and can better estimate the target posture.

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

云南小米辣是一种花和果实同时成熟且一年多次成熟的辣椒品种。果实与背景之间的区分度低且背景复杂。目标物小，难以识别。

方法

本文针对复杂背景环境下云南小米辣的目标检测问题，为减少小米辣与背景之间颜色梯度变化小以及特征信息不清晰所造成的影响，提出了一种改进的PAE-YOLO模型，该模型将EMA注意力机制和DCNv3可变形卷积集成到YOLOv8模型中，提高了模型在复杂环境下对小米辣的特征提取能力和推理速度，并实现了模型轻量化。首先，将EMA注意力机制与YOLOv8网络中的C2f模块相结合。C2f模块能够很好地从输入图像中提取局部特征，而EMA注意力机制可以控制全局关系。二者相辅相成，从而增强了模型的表达能力；同时，在主干网络和头部网络中引入DCNv3卷积模块，其能够根据输入特征图自适应调整采样位置，有助于对不同尺度的目标具有更强的特征捕捉能力以及实现网络轻量化。还使用深度相机估计小米辣的姿态，同时分析和优化不同的遮挡情况。通过消融实验、模型对比实验和姿态估计实验验证了所提方法的有效性。

结果

实验结果表明，该模型的平均精度均值（mAP）达到88.8%，比原模型高1.3%。其F1分数达到83.2，GFLOPs和模型大小分别为7.6G和5.7MB。F1分数在几个网络中排名最佳，模型权重和每秒千兆浮点运算次数（GFLOPs）最小，分别比原模型低6.2%和8.1%。训练期间损失值最低，收敛速度最快。同时，102个目标的姿态估计结果表明，方位估计正确的情况超过85%，平均误差角度为15.91°。在遮挡情况下，86.3%的姿态估计误差角度小于40°，平均误差角度为23.19°。