Zhang Zejin, Wang Tao, Wang Jian, Sun Yao
HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China.
School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China.
J Imaging. 2024 Jan 18;10(1):0. doi: 10.3390/jimaging10010024.
Higher standards have been proposed for detection systems since camouflaged objects are not distinct enough, making it possible to ignore the difference between their background and foreground. In this paper, we present a new framework for Camouflaged Object Detection (COD) named FSANet, which consists mainly of three operations: spatial detail mining (SDM), cross-scale feature combination (CFC), and hierarchical feature aggregation decoder (HFAD). The framework simulates the three-stage detection process of the human visual mechanism when observing a camouflaged scene. Specifically, we have extracted five feature layers using the backbone and divided them into two parts with the second layer as the boundary. The SDM module simulates the human cursory inspection of the camouflaged objects to gather spatial details (such as edge, texture, etc.) and fuses the features to create a cursory impression. The CFC module is used to observe high-level features from various viewing angles and extracts the same features by thoroughly filtering features of various levels. We also design side-join multiplication in the CFC module to avoid detail distortion and use feature element-wise multiplication to filter out noise. Finally, we construct an HFAD module to deeply mine effective features from these two stages, direct the fusion of low-level features using high-level semantic knowledge, and improve the camouflage map using hierarchical cascade technology. Compared to the nineteen deep-learning-based methods in terms of seven widely used metrics, our proposed framework has clear advantages on four public COD datasets, demonstrating the effectiveness and superiority of our model.
由于伪装物体不够清晰,难以区分其背景和前景,因此对检测系统提出了更高的标准。在本文中,我们提出了一种名为FSANet的伪装目标检测(COD)新框架,该框架主要由三个操作组成:空间细节挖掘(SDM)、跨尺度特征组合(CFC)和分层特征聚合解码器(HFAD)。该框架模拟了人类视觉机制在观察伪装场景时的三阶段检测过程。具体来说,我们使用主干网络提取了五个特征层,并以第二层为边界将它们分为两部分。SDM模块模拟人类对伪装物体的粗略检查,以收集空间细节(如边缘、纹理等)并融合特征以形成粗略印象。CFC模块用于从各个视角观察高级特征,并通过彻底过滤各级特征来提取相同的特征。我们还在CFC模块中设计了侧连接乘法以避免细节失真,并使用特征逐元素乘法来滤除噪声。最后,我们构建了一个HFAD模块,从这两个阶段中深度挖掘有效特征,使用高级语义知识指导低级特征的融合,并使用分层级联技术改进伪装地图。在七个广泛使用的指标方面,与十九种基于深度学习的方法相比,我们提出的框架在四个公共COD数据集上具有明显优势,证明了我们模型的有效性和优越性。