IEEE Trans Image Process. 2023;32:2267-2278. doi: 10.1109/TIP.2023.3266659. Epub 2023 Apr 21.
Camouflaged object detection (COD) aims to discover objects that blend in with the background due to similar colors or textures, etc. Existing deep learning methods do not systematically illustrate the key tasks in COD, which seriously hinders the improvement of its performance. In this paper, we introduce the concept of focus areas that represent some regions containing discernable colors or textures, and develop a two-stage focus scanning network for camouflaged object detection. Specifically, a novel encoder-decoder module is first designed to determine a region where the focus areas may appear. In this process, a multi-layer Swin transformer is deployed to encode global context information between the object and the background, and a novel cross-connection decoder is proposed to fuse cross-layer textures or semantics. Then, we utilize the multi-scale dilated convolution to obtain discriminative features with different scales in focus areas. Meanwhile, the dynamic difficulty aware loss is designed to guide the network paying more attention to structural details. Extensive experimental results on the benchmarks, including CAMO, CHAMELEON, COD10K, and NC4K, illustrate that the proposed method performs favorably against other state-of-the-art methods.
伪装目标检测(COD)旨在发现由于颜色或纹理等相似而与背景融合的目标。现有的深度学习方法没有系统地说明 COD 中的关键任务,这严重阻碍了其性能的提高。在本文中,我们引入了焦点区域的概念,这些区域代表了一些包含可识别颜色或纹理的区域,并为伪装目标检测开发了一个两阶段的焦点扫描网络。具体来说,首先设计了一个新颖的编解码器模块来确定可能出现焦点区域的区域。在这个过程中,使用多层 Swin 变压器来编码目标和背景之间的全局上下文信息,并提出了一种新的交叉连接解码器来融合跨层纹理或语义。然后,我们利用多尺度扩张卷积在焦点区域中获得具有不同尺度的有区分的特征。同时,设计了动态难度感知损失,以指导网络更加关注结构细节。在包括 CAMO、CHAMELEON、COD10K 和 NC4K 在内的基准测试上的广泛实验结果表明,所提出的方法优于其他最先进的方法。