UTDNet：一种用于多模态显著目标检测的统一三元解码器网络。

UTDNet: A unified triplet decoder network for multimodal salient object detection.

机构信息

Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China.

出版信息

Neural Netw. 2024 Feb;170:521-534. doi: 10.1016/j.neunet.2023.11.051. Epub 2023 Nov 24.

DOI:10.1016/j.neunet.2023.11.051

Abstract

Image Salient Object Detection (SOD) is a fundamental research topic in the area of computer vision. Recently, the multimodal information in RGB, Depth (D), and Thermal (T) modalities has been proven to be beneficial to the SOD. However, existing methods are only designed for RGB-D or RGB-T SOD, which may limit the utilization in various modalities, or just finetuned on specific datasets, which may bring about extra computation overhead. These defects can hinder the practical deployment of SOD in real-world applications. In this paper, we propose an end-to-end Unified Triplet Decoder Network, dubbed UTDNet, for both RGB-T and RGB-D SOD tasks. The intractable challenges for the unified multimodal SOD are mainly two-fold, i.e., (1) accurately detecting and segmenting salient objects, and (2) preferably via a single network that fits both RGB-T and RGB-D SOD. First, to deal with the former challenge, we propose the multi-scale feature extraction unit to enrich the discriminative contextual information, and the efficient fusion module to explore cross-modality complementary information. Then, the multimodal features are fed to the triplet decoder, where the hierarchical deep supervision loss further enable the network to capture distinctive saliency cues. Second, as to the latter challenge, we propose a simple yet effective continual learning method to unify multimodal SOD. Concretely, we sequentially train multimodal SOD tasks by applying Elastic Weight Consolidation (EWC) regularization with the hierarchical loss function to avoid catastrophic forgetting without inducing more parameters. Critically, the triplet decoder separates task-specific and task-invariant information, making the network easily adaptable to multimodal SOD tasks. Extensive comparisons with 26 recently proposed RGB-T and RGB-D SOD methods demonstrate the superiority of the proposed UTDNet.

摘要

图像显著目标检测（SOD）是计算机视觉领域的一个基础研究课题。最近，RGB、Depth（D）和 Thermal（T）多模态信息已被证明对 SOD 有益。然而，现有的方法仅针对 RGB-D 或 RGB-T SOD 设计，这可能限制了在各种模态中的利用，或者只是在特定数据集上进行微调，这可能会带来额外的计算开销。这些缺陷可能会阻碍 SOD 在实际应用中的实际部署。在本文中，我们提出了一种端到端的统一三元解码器网络，称为 UTDNet，用于 RGB-T 和 RGB-D SOD 任务。统一多模态 SOD 的棘手挑战主要有两个，即（1）准确地检测和分割显著目标，（2）最好通过一个适合 RGB-T 和 RGB-D SOD 的单一网络。首先，为了解决前一个挑战，我们提出了多尺度特征提取单元来丰富判别上下文信息，以及高效融合模块来探索跨模态互补信息。然后，将多模态特征输入到三元解码器中，其中分层深度监督损失进一步使网络能够捕获独特的显著线索。其次，为了解决后一个挑战，我们提出了一种简单而有效的持续学习方法来统一多模态 SOD。具体来说，我们通过应用弹性权重整合（EWC）正则化和分层损失函数来顺序地训练多模态 SOD 任务，以避免灾难性遗忘而不会引入更多参数。至关重要的是，三元解码器分离了特定任务和不变任务的信息，使网络能够轻松适应多模态 SOD 任务。与 26 种最近提出的 RGB-T 和 RGB-D SOD 方法进行的广泛比较证明了所提出的 UTDNet 的优越性。

相似文献

UTDNet: A unified triplet decoder network for multimodal salient object detection.UTDNet：一种用于多模态显著目标检测的统一三元解码器网络。

Neural Netw. 2024 Feb;170:521-534. doi: 10.1016/j.neunet.2023.11.051. Epub 2023 Nov 24.

EM-Trans: Edge-Aware Multimodal Transformer for RGB-D Salient Object Detection.EM-Trans：用于RGB-D显著目标检测的边缘感知多模态Transformer

IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):3175-3188. doi: 10.1109/TNNLS.2024.3358858. Epub 2025 Feb 6.

Absolute and Relative Depth-Induced Network for RGB-D Salient Object Detection.基于绝对和相对深度信息的 RGB-D 显著目标检测网络

Sensors (Basel). 2023 Mar 30;23(7):3611. doi: 10.3390/s23073611.

Siamese Network for RGB-D Salient Object Detection and Beyond.用于RGB-D显著目标检测及其他应用的连体网络

IEEE Trans Pattern Anal Mach Intell. 2021 Apr 16;PP. doi: 10.1109/TPAMI.2021.3073689.

RGB-D Salient Object Detection With Ubiquitous Target Awareness.基于无处不在目标感知的 RGB-D 显著目标检测。

IEEE Trans Image Process. 2021;30:7717-7731. doi: 10.1109/TIP.2021.3108412. Epub 2021 Sep 10.

3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond.用于RGB-D显著目标检测及其他应用的3D卷积神经网络

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4309-4323. doi: 10.1109/TNNLS.2022.3202241. Epub 2024 Feb 29.

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection.SLMSF-Net：用于RGB-D显著目标检测的语义定位与多尺度融合网络

Sensors (Basel). 2024 Feb 8;24(4):1117. doi: 10.3390/s24041117.

DMGNet: Depth mask guiding network for RGB-D salient object detection.DMGNet：用于 RGB-D 显著目标检测的深度掩模引导网络。

Neural Netw. 2024 Dec;180:106751. doi: 10.1016/j.neunet.2024.106751. Epub 2024 Sep 24.

HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness.HiDAnet：基于分层深度感知的 RGB-D 显著目标检测

IEEE Trans Image Process. 2023;32:2160-2173. doi: 10.1109/TIP.2023.3263111.

ICNet: Information Conversion Network for RGB-D Based Salient Object Detection.ICNet：基于RGB-D的显著目标检测的信息转换网络。

IEEE Trans Image Process. 2020 Mar 4. doi: 10.1109/TIP.2020.2976689.

UTDNet：一种用于多模态显著目标检测的统一三元解码器网络。 - Suppr | 超能文献

UTDNet：一种用于多模态显著目标检测的统一三元解码器网络。

UTDNet: A unified triplet decoder network for multimodal salient object detection.

机构信息

Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China.

出版信息

Neural Netw. 2024 Feb;170:521-534. doi: 10.1016/j.neunet.2023.11.051. Epub 2023 Nov 24.

DOI:10.1016/j.neunet.2023.11.051

PMID:38043372

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

UTDNet：一种用于多模态显著目标检测的统一三元解码器网络。

UTDNet: A unified triplet decoder network for multimodal salient object detection.

机构信息

出版信息

相似文献

UTDNet：一种用于多模态显著目标检测的统一三元解码器网络。

UTDNet: A unified triplet decoder network for multimodal salient object detection.

机构信息

出版信息

相似文献