Song Mengke, Li Luming, Yu Xu, Chen Chenglizhao
IEEE Trans Image Process. 2025;34:3903-3917. doi: 10.1109/TIP.2025.3576993.
Salient Object Detection (SOD) aims to identify the most attention-grabbing regions in an image and focuses on distinguishing salient objects from their backgrounds. Current SOD methods primarily use a discriminative approach, which works well for clear images but struggles in complex scenes with similar colors and textures between objects and backgrounds. To address these limitations, we introduce the diffusion-based salient object detection model (DiffSOD), which leverages a noise-to-image denoising process within a diffusion framework, enhancing saliency detection in both RGB and RGB-D images. Unlike conventional fusion-based SOD methods that directly merge RGB and depth information, we treat RGB and depth as distinct conditions, i.e., the appearance condition and the structure condition, respectively. These conditions serve as controls within the diffusion UNet architecture, guiding the denoising process. To facilitate this guidance, we employ two specialized control adapters: the appearance control adapter and the structure control adapter. Moreover, conventional denoising UNet models may struggle when handling low-quality depth maps, potentially introducing detrimental cues into the denoising process. To mitigate the impact of low-quality depth maps, we introduce a quality-aware filter. This filter selectively processes only high-quality depth data, ensuring that the denoising process is based on reliable information. Comparative evaluations on benchmark datasets have shown that DiffSOD substantially surpasses existing RGB and RGB-D saliency detection methods, improving average performance by 1.5% and 1.2% respectively, thus setting a new benchmark for diffusion-based dense prediction models in visual saliency detection.
显著目标检测(SOD)旨在识别图像中最引人注目的区域,并专注于将显著目标与其背景区分开来。当前的SOD方法主要采用判别式方法,这种方法在清晰图像上效果良好,但在物体与背景之间颜色和纹理相似的复杂场景中表现不佳。为了解决这些局限性,我们引入了基于扩散的显著目标检测模型(DiffSOD),该模型在扩散框架内利用从噪声到图像的去噪过程,增强了RGB图像和RGB-D图像中的显著性检测。与直接合并RGB和深度信息的传统基于融合的SOD方法不同,我们将RGB和深度分别视为不同的条件,即外观条件和结构条件。这些条件在扩散UNet架构中作为控制因素,指导去噪过程。为了便于这种指导,我们采用了两个专门的控制适配器:外观控制适配器和结构控制适配器。此外,传统的去噪UNet模型在处理低质量深度图时可能会遇到困难,这可能会在去噪过程中引入有害线索。为了减轻低质量深度图的影响,我们引入了一个质量感知滤波器。该滤波器仅选择性地处理高质量深度数据,确保去噪过程基于可靠信息。在基准数据集上的比较评估表明,DiffSOD大大超越了现有的RGB和RGB-D显著性检测方法,平均性能分别提高了1.5%和1.2%,从而为基于扩散的视觉显著性检测中的密集预测模型设定了新的基准。