Suppr超能文献

CECS-CLIP:融合领域知识用于珍稀野生动物检测模型

CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model.

作者信息

Yang Feng, Hu Chunying, Liang Aokang, Wang Sheng, Su Yun, Xu Fu

机构信息

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China.

Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China.

出版信息

Animals (Basel). 2024 Oct 9;14(19):2909. doi: 10.3390/ani14192909.

Abstract

Accurate and efficient wildlife monitoring is essential for conservation efforts. Traditional image-based methods often struggle to detect small, occluded, or camouflaged animals due to the challenges posed by complex natural environments. To overcome these limitations, an innovative multimodal target detection framework is proposed in this study, which integrates textual information from an animal knowledge base as supplementary features to enhance detection performance. First, a concept enhancement module was developed, employing a cross-attention mechanism to fuse features based on the correlation between textual and image features, thereby obtaining enhanced image features. Secondly, a feature normalization module was developed, amplifying cosine similarity and introducing learnable parameters to continuously weight and transform image features, further enhancing their expressive power in the feature space. Rigorous experimental validation on a specialized dataset provided by the research team at Northwest A&F University demonstrates that our multimodal model achieved a 0.3% improvement in precision over single-modal methods. Compared to existing multimodal target detection algorithms, this model achieved at least a 25% improvement in AP and excelled in detecting small targets of certain species, significantly surpassing existing multimodal target detection model benchmarks. This study offers a multimodal target detection model integrating textual and image information for the conservation of rare and endangered wildlife, providing strong evidence and new perspectives for research in this field.

摘要

准确而高效的野生动物监测对于保护工作至关重要。由于复杂自然环境带来的挑战,传统的基于图像的方法在检测小型、被遮挡或伪装的动物时往往面临困难。为了克服这些限制,本研究提出了一种创新的多模态目标检测框架,该框架整合了动物知识库中的文本信息作为补充特征,以提高检测性能。首先,开发了一个概念增强模块,采用交叉注意力机制根据文本和图像特征之间的相关性融合特征,从而获得增强的图像特征。其次,开发了一个特征归一化模块,放大余弦相似度并引入可学习参数以持续加权和变换图像特征,进一步增强其在特征空间中的表达能力。西北农林科技大学研究团队提供的专门数据集上的严格实验验证表明,我们的多模态模型在精度上比单模态方法提高了0.3%。与现有的多模态目标检测算法相比,该模型在平均精度(AP)上至少提高了25%,并且在检测某些物种的小目标方面表现出色,显著超越了现有的多模态目标检测模型基准。本研究为珍稀濒危野生动物保护提供了一个整合文本和图像信息的多模态目标检测模型,为该领域的研究提供了有力证据和新视角。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/60e0/11476111/6f8b432c8ded/animals-14-02909-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验