Suppr超能文献

用于伪装目标检测的具有动态令牌聚类的分层图交互变换器

Hierarchical Graph Interaction Transformer With Dynamic Token Clustering for Camouflaged Object Detection.

作者信息

Yao Siyuan, Sun Hao, Xiang Tian-Zhu, Wang Xiao, Cao Xiaochun

出版信息

IEEE Trans Image Process. 2024;33:5936-5948. doi: 10.1109/TIP.2024.3475219. Epub 2024 Oct 18.

Abstract

Camouflaged object detection (COD) aims to identify the objects that seamlessly blend into the surrounding backgrounds. Due to the intrinsic similarity between the camouflaged objects and the background region, it is extremely challenging to precisely distinguish the camouflaged objects by existing approaches. In this paper, we propose a hierarchical graph interaction network termed HGINet for camouflaged object detection, which is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features. Specifically, we first design a region-aware token focusing attention (RTFA) with dynamic token clustering to excavate the potentially distinguishable tokens in the local region. Afterwards, a hierarchical graph interaction transformer (HGIT) is proposed to construct bi-directional aligned communication between hierarchical features in the latent interaction space for visual semantics enhancement. Furthermore, we propose a decoder network with confidence aggregated feature fusion (CAFF) modules, which progressively fuses the hierarchical interacted features to refine the local detail in ambiguous regions. Extensive experiments conducted on the prevalent datasets, i.e. COD10K, CAMO, NC4K and CHAMELEON demonstrate the superior performance of HGINet compared to existing state-of-the-art methods. Our code is available at https://github.com/Garyson1204/HGINet.

摘要

伪装目标检测(COD)旨在识别那些与周围背景无缝融合的目标。由于伪装目标与背景区域之间存在内在的相似性,因此利用现有方法精确区分伪装目标极具挑战性。在本文中,我们提出了一种用于伪装目标检测的分层图交互网络HGINet,它能够通过分层令牌化特征之间的有效图交互来发现难以察觉的目标。具体而言,我们首先设计了一种具有动态令牌聚类的区域感知令牌聚焦注意力(RTFA),以挖掘局部区域中潜在可区分的令牌。之后,提出了一种分层图交互变换器(HGIT),用于在潜在交互空间中构建分层特征之间的双向对齐通信,以增强视觉语义。此外,我们提出了一种带有置信度聚合特征融合(CAFF)模块的解码器网络,该网络逐步融合分层交互特征,以细化模糊区域中的局部细节。在流行数据集COD10K、CAMO、NC4K和CHAMELEON上进行的大量实验表明,HGINet与现有最先进方法相比具有卓越的性能。我们的代码可在https://github.com/Garyson1204/HGINet获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验