Peng Yanbin, Zhai Zhinian, Feng Mingkun
School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China.
Sensors (Basel). 2024 Feb 8;24(4):1117. doi: 10.3390/s24041117.
Salient Object Detection (SOD) in RGB-D images plays a crucial role in the field of computer vision, with its central aim being to identify and segment the most visually striking objects within a scene. However, optimizing the fusion of multi-modal and multi-scale features to enhance detection performance remains a challenge. To address this issue, we propose a network model based on semantic localization and multi-scale fusion (SLMSF-Net), specifically designed for RGB-D SOD. Firstly, we designed a Deep Attention Module (DAM), which extracts valuable depth feature information from both channel and spatial perspectives and efficiently merges it with RGB features. Subsequently, a Semantic Localization Module (SLM) is introduced to enhance the top-level modality fusion features, enabling the precise localization of salient objects. Finally, a Multi-Scale Fusion Module (MSF) is employed to perform inverse decoding on the modality fusion features, thus restoring the detailed information of the objects and generating high-precision saliency maps. Our approach has been validated across six RGB-D salient object detection datasets. The experimental results indicate an improvement of 0.201.80%, 0.091.46%, 0.191.05%, and 0.00020.0062, respectively in maxF, maxE, S, and MAE metrics, compared to the best competing methods (AFNet, DCMF, and C2DFNet).
RGB-D图像中的显著目标检测(SOD)在计算机视觉领域起着至关重要的作用,其核心目标是识别和分割场景中视觉上最突出的物体。然而,优化多模态和多尺度特征的融合以提高检测性能仍然是一个挑战。为了解决这个问题,我们提出了一种基于语义定位和多尺度融合的网络模型(SLMSF-Net),专门为RGB-D SOD设计。首先,我们设计了一个深度注意力模块(DAM),它从通道和空间两个角度提取有价值的深度特征信息,并有效地将其与RGB特征合并。随后,引入了一个语义定位模块(SLM)来增强顶层模态融合特征,从而实现显著物体的精确定位。最后,采用多尺度融合模块(MSF)对模态融合特征进行逆解码,从而恢复物体的详细信息并生成高精度的显著性图。我们的方法已在六个RGB-D显著目标检测数据集上得到验证。实验结果表明,与最佳竞争方法(AFNet、DCMF和C2DFNet)相比,在maxF、maxE、S和MAE指标上分别提高了0.201.80%、0.091.46%、0.191.05%和0.00020.0062。