基于图像与文本信息融合的病鸡粪便高精度识别方法

High-Accuracy Recognition Method for Diseased Chicken Feces Based on Image and Text Information Fusion.

作者信息

Yang Duanli, Tian Zishang, Xi Jianzhong, Chen Hui, Sun Erdong, Wang Lianzeng

机构信息

College of Information Science and Technology, Hebei Agricultural University, Baoding 071001, China.

Hebei Key Laboratory of Agricultural Big Data, Baoding 071001, China.

出版信息

Animals (Basel). 2025 Jul 22;15(15):2158. doi: 10.3390/ani15152158.

DOI:10.3390/ani15152158

PMID:40804949

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12345488/

Abstract

Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces Diagnosis), a ResNet50-based multimodal fusion model leveraging semantic complementarity between images and descriptive text to enhance diagnostic precision. Key innovations include the following: (1) Integrating MASA(Manhattan self-attention)and DSconv (Depthwise Separable convolution) into the backbone network to mitigate feature confusion. (2) Utilizing a pre-trained BERT to extract textual semantic features, reducing annotation dependency and cost. (3) Designing a lightweight Gated Cross-Attention (GCA) module for dynamic multimodal fusion, achieving a 41% parameter reduction versus cross-modal transformers. Experiments demonstrate that MMCD significantly outperforms single-modal baselines in Accuracy (+8.69%), Recall (+8.72%), Precision (+8.67%), and F1 score (+8.72%). It surpasses simple feature concatenation by 2.51-2.82% and reduces parameters by 7.5M and computations by 1.62 GFLOPs versus the base ResNet50. This work validates multimodal fusion's efficacy in pathological fecal detection, providing a theoretical and technical foundation for agricultural health monitoring systems.

摘要

家禽粪便作为健康评估的关键生物标志物，为保障食品安全，需要及时且准确的病理识别。传统的仅依靠视觉的方法存在局限性，因为其对环境敏感，且不同疾病的粪便在视觉上相似度高。为解决这一问题，我们提出了MMCD（多模态鸡粪诊断），这是一个基于ResNet50的多模态融合模型，利用图像和描述性文本之间的语义互补性来提高诊断精度。关键创新点如下：（1）将曼哈顿自注意力（MASA）和深度可分离卷积（DSconv）集成到骨干网络中，以减轻特征混淆。（2）利用预训练的BERT提取文本语义特征，减少注释依赖和成本。（3）设计了一个轻量级的门控交叉注意力（GCA）模块用于动态多模态融合，与跨模态变压器相比，参数减少了41%。实验表明，MMCD在准确率（提高8.69%）、召回率（提高8.72%）、精确率（提高8.67%）和F1分数（提高8.72%）方面显著优于单模态基线。与简单的特征拼接相比，它提高了2.51 - 2.82%，与基础ResNet50相比，参数减少了750万个，计算量减少了1.62 GFLOPs。这项工作验证了多模态融合在病理性粪便检测中的有效性，为农业健康监测系统提供了理论和技术基础。