CECS-CLIP：融合领域知识用于珍稀野生动物检测模型

CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model.

作者信息

Yang Feng, Hu Chunying, Liang Aokang, Wang Sheng, Su Yun, Xu Fu

机构信息

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China.

Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China.

出版信息

Animals (Basel). 2024 Oct 9;14(19):2909. doi: 10.3390/ani14192909.

DOI:10.3390/ani14192909

PMID:39409858

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11476111/

Abstract

Accurate and efficient wildlife monitoring is essential for conservation efforts. Traditional image-based methods often struggle to detect small, occluded, or camouflaged animals due to the challenges posed by complex natural environments. To overcome these limitations, an innovative multimodal target detection framework is proposed in this study, which integrates textual information from an animal knowledge base as supplementary features to enhance detection performance. First, a concept enhancement module was developed, employing a cross-attention mechanism to fuse features based on the correlation between textual and image features, thereby obtaining enhanced image features. Secondly, a feature normalization module was developed, amplifying cosine similarity and introducing learnable parameters to continuously weight and transform image features, further enhancing their expressive power in the feature space. Rigorous experimental validation on a specialized dataset provided by the research team at Northwest A&F University demonstrates that our multimodal model achieved a 0.3% improvement in precision over single-modal methods. Compared to existing multimodal target detection algorithms, this model achieved at least a 25% improvement in AP and excelled in detecting small targets of certain species, significantly surpassing existing multimodal target detection model benchmarks. This study offers a multimodal target detection model integrating textual and image information for the conservation of rare and endangered wildlife, providing strong evidence and new perspectives for research in this field.

摘要

准确而高效的野生动物监测对于保护工作至关重要。由于复杂自然环境带来的挑战，传统的基于图像的方法在检测小型、被遮挡或伪装的动物时往往面临困难。为了克服这些限制，本研究提出了一种创新的多模态目标检测框架，该框架整合了动物知识库中的文本信息作为补充特征，以提高检测性能。首先，开发了一个概念增强模块，采用交叉注意力机制根据文本和图像特征之间的相关性融合特征，从而获得增强的图像特征。其次，开发了一个特征归一化模块，放大余弦相似度并引入可学习参数以持续加权和变换图像特征，进一步增强其在特征空间中的表达能力。西北农林科技大学研究团队提供的专门数据集上的严格实验验证表明，我们的多模态模型在精度上比单模态方法提高了0.3%。与现有的多模态目标检测算法相比，该模型在平均精度（AP）上至少提高了25%，并且在检测某些物种的小目标方面表现出色，显著超越了现有的多模态目标检测模型基准。本研究为珍稀濒危野生动物保护提供了一个整合文本和图像信息的多模态目标检测模型，为该领域的研究提供了有力证据和新视角。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

CECS-CLIP：融合领域知识用于珍稀野生动物检测模型

CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

CECS-CLIP：融合领域知识用于珍稀野生动物检测模型

CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model.

作者信息

机构信息

出版信息

相似文献

本文引用的文献