Abbas Hyder, Ren Shen Bing, Asim Muhammad, Hassan Syeda Iqra, Abd El-Latif Ahmed A
State Key Laboratory of Public Big Data, College of Computer Science and Technology, Institute for Artificial Intelligence, Guizhou University, Guiyang, Guizhou, China.
School of Computer Science and Engineering, Central South University, Changsha, China.
PeerJ Comput Sci. 2025 May 19;11:e2623. doi: 10.7717/peerj-cs.2623. eCollection 2025.
Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision. To address this challenge posed by complex backgrounds in salient object detection is crucial for advancing the field. This article proposes a novel deep learning-based architecture called SODU2-NET (Salient object detection U2-Net) for salient object detection that utilizes the U-NET base structure. This model addresses a gap in previous work that focused primarily on complex backgrounds by employing a densely supervised encoder-decoder network. The proposed SODU2-NET employs sophisticated background subtraction techniques and utilizes advanced deep learning architectures that can discern relevant foreground information when dealing with complex backgrounds. Firstly, an enriched encoder block with full feature fusion (FFF) with atrous spatial pyramid pooling (ASPP) varying dilation rates to efficiently capture multi-scale contextual information, improving salient object detection in complex backgrounds and reducing the loss of information during down-sampling. Secondly the block includes an attention module that refines the decoder, is constructed to enhances the detection of salient objects in complex backgrounds by selectively focusing attention on relevant features. This allows the model to reconstruct detailed and contextually relevant information, which is essential to determining salient objects accurately. Finally, the architecture has been improved by adding a residual block at the encoder end, which is responsible for both saliency prediction and map refinement. The proposed network is designed to learn the transformation between input images and ground truth, enabling accurate segmentation of salient object regions with clear borders and accurate prediction of fine structures. SODU2-NET is demonstrated to have superior performance in five public datasets, including DUTS, SOD, DUT OMRON, HKU-IS, PASCAL-S, and a new real world dataset, the Changsha dataset. Based on a comparative assessment of the model FCN, Squeeze-net, Deep Lab, Mask R-CNN the proposed SODU2-NET is found and achieve an improvement of precision (6%), recall (5%) and accuracy (3%). Overall, approach shows promise for improving the accuracy and efficiency of salient object detection in a variety of settings.
从自然场景中检测和分割显著物体,通常称为显著物体检测,在计算机视觉领域引起了极大的关注。应对显著物体检测中复杂背景带来的这一挑战对于推动该领域发展至关重要。本文提出了一种名为SODU2-NET(显著物体检测U2-Net)的基于深度学习的新颖架构用于显著物体检测,该架构利用了U-NET基础结构。此模型通过采用密集监督的编码器-解码器网络解决了先前工作中主要关注复杂背景的一个空白。所提出的SODU2-NET采用了复杂的背景减法技术,并利用了先进的深度学习架构,在处理复杂背景时能够辨别相关的前景信息。首先,一个具有全特征融合(FFF)和空洞空间金字塔池化(ASPP)且空洞率不同的丰富编码器块,以有效捕获多尺度上下文信息,改善在复杂背景下的显著物体检测并减少下采样过程中的信息损失。其次,该块包括一个注意力模块,用于细化解码器,其构建方式是通过有选择地将注意力集中在相关特征上来增强在复杂背景下对显著物体的检测。这使得模型能够重建详细且上下文相关的信息,这对于准确确定显著物体至关重要。最后,通过在编码器端添加一个残差块对架构进行了改进,该残差块负责显著性预测和图细化。所提出的网络旨在学习输入图像与真实标签之间的变换,从而能够准确分割具有清晰边界的显著物体区域并精确预测精细结构。实验证明,SODU2-NET在包括DUTS、SOD、DUT OMRON、HKU-IS、PASCAL-S在内的五个公共数据集以及一个新的真实世界数据集——长沙数据集中具有卓越性能。基于对模型FCN、Squeeze-net、Deep Lab、Mask R-CNN的比较评估,发现所提出的SODU2-NET在精度(提高6%)、召回率(提高5%)和准确率(提高3%)方面均有提升。总体而言,该方法有望在各种场景中提高显著物体检测的准确性和效率。