Liu Guoxu, Zhang Yonghui, Liu Jun, Liu Deyong, Chen Chunlei, Li Yujie, Zhang Xiujie, Touko Mbouembe Philippe Lyonel
School of Computer Engineering, Weifang University, Weifang, China.
Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang, China.
Front Plant Sci. 2024 Sep 26;15:1452821. doi: 10.3389/fpls.2024.1452821. eCollection 2024.
Accurate fruit detection is crucial for automated fruit picking. However, real-world scenarios, influenced by complex environmental factors such as illumination variations, occlusion, and overlap, pose significant challenges to accurate fruit detection. These challenges subsequently impact the commercialization of fruit harvesting robots. A tomato detection model named YOLO-SwinTF, based on YOLOv7, is proposed to address these challenges. Integrating Swin Transformer (ST) blocks into the backbone network enables the model to capture global information by modeling long-range visual dependencies. Trident Pyramid Networks (TPN) are introduced to overcome the limitations of PANet's focus on communication-based processing. TPN incorporates multiple self-processing (SP) modules within existing top-down and bottom-up architectures, allowing feature maps to generate new findings for communication. In addition, Focaler-IoU is introduced to reconstruct the original intersection-over-union (IoU) loss to allow the loss function to adjust its focus based on the distribution of difficult and easy samples. The proposed model is evaluated on a tomato dataset, and the experimental results demonstrated that the proposed model's detection recall, precision, F score, and AP reach 96.27%, 96.17%, 96.22%, and 98.67%, respectively. These represent improvements of 1.64%, 0.92%, 1.28%, and 0.88% compared to the original YOLOv7 model. When compared to other state-of-the-art detection methods, this approach achieves superior performance in terms of accuracy while maintaining comparable detection speed. In addition, the proposed model exhibits strong robustness under various lighting and occlusion conditions, demonstrating its significant potential in tomato detection.
准确的水果检测对于自动化水果采摘至关重要。然而,受光照变化、遮挡和重叠等复杂环境因素影响的现实场景,给准确的水果检测带来了重大挑战。这些挑战随后影响了水果采摘机器人的商业化。为应对这些挑战,提出了一种基于YOLOv7的名为YOLO-SwinTF的番茄检测模型。将Swin Transformer(ST)模块集成到骨干网络中,使该模型能够通过对长距离视觉依赖进行建模来捕获全局信息。引入三叉戟金字塔网络(TPN)以克服PANet专注于基于通信的处理的局限性。TPN在现有的自上而下和自下而上架构中纳入了多个自处理(SP)模块,允许特征图生成用于通信的新结果。此外,引入了Focaler-IoU来重构原始的交并比(IoU)损失,使损失函数能够根据难易样本的分布调整其关注点。在一个番茄数据集上对所提出的模型进行了评估,实验结果表明,所提出模型的检测召回率、精确率、F分数和平均精度分别达到96.27%、96.17%、96.22%和98.67%。与原始的YOLOv7模型相比,这些分别提高了1.64%、0.92%、1.28%和0.88%。与其他先进的检测方法相比,该方法在保持可比检测速度的同时,在准确性方面取得了卓越的性能。此外,所提出的模型在各种光照和遮挡条件下表现出很强的鲁棒性,证明了其在番茄检测中的巨大潜力。