一种用于农田多标签捕食者-猎物检测的卷积神经网络-Transformer混合框架。

A CNN-Transformer Hybrid Framework for Multi-Label Predator-Prey Detection in Agricultural Fields.

作者信息

Lyu Yifan, Lu Feiyu, Wang Xuaner, Wang Yakui, Wang Zihuan, Zhu Yawen, Wang Zhewei, Dong Min

机构信息

China Agricultural University, Beijing 100083, China.

University of International Business and Economics, Beijing 100029, China.

出版信息

Sensors (Basel). 2025 Jul 31;25(15):4719. doi: 10.3390/s25154719.

DOI:10.3390/s25154719

PMID:40807883

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12349504/

Abstract

Accurate identification of predator-pest relationships is essential for implementing effective and sustainable biological control in agriculture. However, existing image-based methods struggle to recognize insect co-occurrence under complex field conditions, limiting their ecological applicability. To address this challenge, we propose a hybrid deep learning framework that integrates convolutional neural networks (CNNs) and Transformer architectures for multi-label recognition of predator-pest combinations. The model leverages a novel co-occurrence attention mechanism to capture semantic relationships between insect categories and employs a pairwise label matching loss to enhance ecological pairing accuracy. Evaluated on a field-constructed dataset of 5,037 images across eight categories, the model achieved an F1-score of 86.5%, mAP50 of 85.1%, and demonstrated strong generalization to unseen predator-pest pairs with an average F1-score of 79.6%. These results outperform several strong baselines, including ResNet-50, YOLOv8, and Vision Transformer. This work contributes a robust, interpretable approach for multi-object ecological detection and offers practical potential for deployment in smart farming systems, UAV-based monitoring, and precision pest management.

摘要

准确识别捕食者与害虫的关系对于在农业中实施有效且可持续的生物防治至关重要。然而，现有的基于图像的方法在复杂田间条件下难以识别昆虫的共生情况，限制了它们的生态适用性。为应对这一挑战，我们提出了一种混合深度学习框架，该框架整合了卷积神经网络（CNN）和Transformer架构，用于对捕食者 - 害虫组合进行多标签识别。该模型利用一种新颖的共生注意力机制来捕捉昆虫类别之间的语义关系，并采用成对标签匹配损失来提高生态配对的准确性。在一个由八个类别的5037张图像组成的现场构建数据集上进行评估时，该模型的F1分数达到86.5%，mAP50为85.1%，并且对未见的捕食者 - 害虫对具有很强的泛化能力，平均F1分数为79.6%。这些结果优于包括ResNet - 50、YOLOv8和视觉Transformer在内的多个强大基线。这项工作为多目标生态检测贡献了一种强大且可解释的方法，并为在智能农业系统、基于无人机的监测和精准害虫管理中的部署提供了实际潜力。