用于视频-深度-文本显著目标检测的质量感知选择性融合网络

Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection.

作者信息

Bao Liuxin, Zhou Xiaofei, Lu Xiankai, Sun Yaoqi, Yin Haibing, Hu Zhenghui, Zhang Jiyong, Yan Chenggang

出版信息

IEEE Trans Image Process. 2024;33:3212-3226. doi: 10.1109/TIP.2024.3393365. Epub 2024 May 6.

DOI:10.1109/TIP.2024.3393365

Abstract

Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers pay attention to the triple-modal SOD task, namely the visible-depth-thermal (VDT) SOD, where they attempt to explore the complementarity of the RGB image, the depth image, and the thermal image. However, existing triple-modal SOD methods fail to perceive the quality of depth maps and thermal images, which leads to performance degradation when dealing with scenes with low-quality depth and thermal images. Therefore, in this paper, we propose a quality-aware selective fusion network (QSF-Net) to conduct VDT salient object detection, which contains three subnets including the initial feature extraction subnet, the quality-aware region selection subnet, and the region-guided selective fusion subnet. Firstly, except for extracting features, the initial feature extraction subnet can generate a preliminary prediction map from each modality via a shrinkage pyramid architecture, which is equipped with the multi-scale fusion (MSF) module. Then, we design the weakly-supervised quality-aware region selection subnet to generate the quality-aware maps. Concretely, we first find the high-quality and low-quality regions by using the preliminary predictions, which further constitute the pseudo label that can be used to train this subnet. Finally, the region-guided selective fusion subnet purifies the initial features under the guidance of the quality-aware maps, and then fuses the triple-modal features and refines the edge details of prediction maps through the intra-modality and inter-modality attention (IIA) module and the edge refinement (ER) module, respectively. Extensive experiments are performed on VDT-2048 dataset, and the results show that our saliency model consistently outperforms 13 state-of-the-art methods with a large margin. Our code and results are available at https://github.com/Lx-Bao/QSFNet.

摘要

深度图像和热图像包含空间几何信息和表面温度信息，它们可以作为RGB模态的补充信息。然而，在一些具有挑战性的场景中，深度图像和热图像的质量往往不可靠，这将导致基于双模态的显著目标检测（SOD）性能下降。同时，一些研究人员关注三模态SOD任务，即可见光-深度-热（VDT）SOD，他们试图探索RGB图像、深度图像和热图像之间的互补性。然而，现有的三模态SOD方法无法感知深度图和热图像的质量，这导致在处理低质量深度和热图像的场景时性能下降。因此，在本文中，我们提出了一种质量感知选择性融合网络（QSF-Net）来进行VDT显著目标检测，它包含三个子网，即初始特征提取子网、质量感知区域选择子网和区域引导选择性融合子网。首先，除了提取特征外，初始特征提取子网可以通过收缩金字塔架构从每个模态生成一个初步预测图，该架构配备了多尺度融合（MSF）模块。然后，我们设计了弱监督质量感知区域选择子网来生成质量感知图。具体来说，我们首先利用初步预测找到高质量和低质量区域，这些区域进一步构成可用于训练该子网的伪标签。最后，区域引导选择性融合子网在质量感知图的引导下净化初始特征，然后融合三模态特征，并分别通过模态内和模态间注意力（IIA）模块和边缘细化（ER）模块细化预测图的边缘细节。在VDT-2048数据集上进行了大量实验，结果表明我们的显著性模型始终以较大优势优于13种最新方法。我们的代码和结果可在https://github.com/Lx-Bao/QSFNet上获取。

相似文献

Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection.用于视频-深度-文本显著目标检测的质量感知选择性融合网络

IEEE Trans Image Process. 2024;33:3212-3226. doi: 10.1109/TIP.2024.3393365. Epub 2024 May 6.

Dynamic Selective Network for RGB-D Salient Object Detection.基于动态选择网络的 RGB-D 显著目标检测

IEEE Trans Image Process. 2021;30:9179-9192. doi: 10.1109/TIP.2021.3123548. Epub 2021 Nov 10.

MSEDNet: Multi-scale fusion and edge-supervised network for RGB-T salient object detection.MSEDNet：用于RGB-T显著目标检测的多尺度融合与边缘监督网络

Neural Netw. 2024 Mar;171:410-422. doi: 10.1016/j.neunet.2023.12.031. Epub 2023 Dec 19.

CDNet: Complementary Depth Network for RGB-D Salient Object Detection.CDNet：用于RGB-D显著目标检测的互补深度网络。

IEEE Trans Image Process. 2021;30:3376-3390. doi: 10.1109/TIP.2021.3060167. Epub 2021 Mar 9.

Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection.用于RGB-D显著目标检测的全局引导跨模态跨尺度网络

Sensors (Basel). 2023 Aug 17;23(16):7221. doi: 10.3390/s23167221.

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection.SLMSF-Net：用于RGB-D显著目标检测的语义定位与多尺度融合网络

Sensors (Basel). 2024 Feb 8;24(4):1117. doi: 10.3390/s24041117.

EM-Trans: Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection.EM-Trans：用于RGB-D显著目标检测的边缘感知多模态Transformer

IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):3175-3188. doi: 10.1109/TNNLS.2024.3358858. Epub 2025 Feb 6.

Depth-Quality-Aware Salient Object Detection.深度质量感知显著目标检测

IEEE Trans Image Process. 2021;30:2350-2363. doi: 10.1109/TIP.2021.3052069. Epub 2021 Jan 27.

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection.基于Swin Transformer的RGB-D显著目标检测边缘引导网络

Sensors (Basel). 2023 Oct 29;23(21):8802. doi: 10.3390/s23218802.

ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection.ASIF-Net：用于 RGB-D 显著目标检测的注意力导向交织融合网络。

IEEE Trans Cybern. 2021 Jan;51(1):88-100. doi: 10.1109/TCYB.2020.2969255. Epub 2020 Dec 22.

用于视频-深度-文本显著目标检测的质量感知选择性融合网络

Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection.

作者信息

Bao Liuxin, Zhou Xiaofei, Lu Xiankai, Sun Yaoqi, Yin Haibing, Hu Zhenghui, Zhang Jiyong, Yan Chenggang

出版信息

IEEE Trans Image Process. 2024;33:3212-3226. doi: 10.1109/TIP.2024.3393365. Epub 2024 May 6.

DOI:10.1109/TIP.2024.3393365

PMID:38687650

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于视频-深度-文本显著目标检测的质量感知选择性融合网络

Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection.

作者信息

出版信息

相似文献

用于视频-深度-文本显著目标检测的质量感知选择性融合网络

Quality-Aware Selective Fusion Network for V-D-T Salient Object Detection.

作者信息

出版信息

相似文献