Suppr超能文献

一种基于Swin Transformer和三叉戟金字塔网络的改进YOLOv7模型,用于精确的番茄检测。

An improved YOLOv7 model based on Swin Transformer and Trident Pyramid Networks for accurate tomato detection.

作者信息

Liu Guoxu, Zhang Yonghui, Liu Jun, Liu Deyong, Chen Chunlei, Li Yujie, Zhang Xiujie, Touko Mbouembe Philippe Lyonel

机构信息

School of Computer Engineering, Weifang University, Weifang, China.

Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang, China.

出版信息

Front Plant Sci. 2024 Sep 26;15:1452821. doi: 10.3389/fpls.2024.1452821. eCollection 2024.

Abstract

Accurate fruit detection is crucial for automated fruit picking. However, real-world scenarios, influenced by complex environmental factors such as illumination variations, occlusion, and overlap, pose significant challenges to accurate fruit detection. These challenges subsequently impact the commercialization of fruit harvesting robots. A tomato detection model named YOLO-SwinTF, based on YOLOv7, is proposed to address these challenges. Integrating Swin Transformer (ST) blocks into the backbone network enables the model to capture global information by modeling long-range visual dependencies. Trident Pyramid Networks (TPN) are introduced to overcome the limitations of PANet's focus on communication-based processing. TPN incorporates multiple self-processing (SP) modules within existing top-down and bottom-up architectures, allowing feature maps to generate new findings for communication. In addition, Focaler-IoU is introduced to reconstruct the original intersection-over-union (IoU) loss to allow the loss function to adjust its focus based on the distribution of difficult and easy samples. The proposed model is evaluated on a tomato dataset, and the experimental results demonstrated that the proposed model's detection recall, precision, F score, and AP reach 96.27%, 96.17%, 96.22%, and 98.67%, respectively. These represent improvements of 1.64%, 0.92%, 1.28%, and 0.88% compared to the original YOLOv7 model. When compared to other state-of-the-art detection methods, this approach achieves superior performance in terms of accuracy while maintaining comparable detection speed. In addition, the proposed model exhibits strong robustness under various lighting and occlusion conditions, demonstrating its significant potential in tomato detection.

摘要

准确的水果检测对于自动化水果采摘至关重要。然而,受光照变化、遮挡和重叠等复杂环境因素影响的现实场景,给准确的水果检测带来了重大挑战。这些挑战随后影响了水果采摘机器人的商业化。为应对这些挑战,提出了一种基于YOLOv7的名为YOLO-SwinTF的番茄检测模型。将Swin Transformer(ST)模块集成到骨干网络中,使该模型能够通过对长距离视觉依赖进行建模来捕获全局信息。引入三叉戟金字塔网络(TPN)以克服PANet专注于基于通信的处理的局限性。TPN在现有的自上而下和自下而上架构中纳入了多个自处理(SP)模块,允许特征图生成用于通信的新结果。此外,引入了Focaler-IoU来重构原始的交并比(IoU)损失,使损失函数能够根据难易样本的分布调整其关注点。在一个番茄数据集上对所提出的模型进行了评估,实验结果表明,所提出模型的检测召回率、精确率、F分数和平均精度分别达到96.27%、96.17%、96.22%和98.67%。与原始的YOLOv7模型相比,这些分别提高了1.64%、0.92%、1.28%和0.88%。与其他先进的检测方法相比,该方法在保持可比检测速度的同时,在准确性方面取得了卓越的性能。此外,所提出的模型在各种光照和遮挡条件下表现出很强的鲁棒性,证明了其在番茄检测中的巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de11/11464322/2c77a23570b7/fpls-15-1452821-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验