Li Guanbin, Chen Zhuohua, Mao Mingzhi, Lin Liang, Fang Chaowei
IEEE Trans Image Process. 2024;33:5510-5524. doi: 10.1109/TIP.2024.3413598. Epub 2024 Oct 4.
Due to the advancement of deep learning, the performance of salient object detection (SOD) has been significantly improved. However, deep learning-based techniques require a sizable amount of pixel-wise annotations. To relieve the burden of data annotation, a variety of deep weakly-supervised and unsupervised SOD methods have been proposed, yet the performance gap between them and fully supervised methods remains significant. In this paper, we propose a novel, cost-efficient salient object detection framework, which can adapt models from synthetic data to real-world data with the help of a limited number of actively selected annotations. Specifically, we first construct a synthetic SOD dataset by copying and pasting foreground objects into pure background images. With the masks of foreground objects taken as the ground-truth saliency maps, this dataset can be used for training the SOD model initially. However, due to the large domain gap between synthetic images and real-world images, the performance of the initially trained model on the real-world images is deficient. To transfer the model from the synthetic dataset to the real-world datasets, we further design an uncertainty-aware active domain adaptive algorithm to generate labels for the real-world target images. The prediction variances against data augmentations are utilized to calculate the superpixel-level uncertainty values. For those superpixels with relatively low uncertainty, we directly generate pseudo labels according to the network predictions. Meanwhile, we select a few superpixels with high uncertainty scores and assign labels to them manually. This labeling strategy is capable of generating high-quality labels without incurring too much annotation cost. Experimental results on six benchmark SOD datasets demonstrate that our method outperforms the existing state-of-the-art weakly-supervised and unsupervised SOD methods and is even comparable to the fully supervised ones. Code will be released at: https://github.com/czh-3/UADA.
由于深度学习的发展,显著目标检测(SOD)的性能有了显著提高。然而,基于深度学习的技术需要大量的逐像素标注。为了减轻数据标注的负担,人们提出了各种深度弱监督和无监督的SOD方法,但它们与全监督方法之间的性能差距仍然很大。在本文中,我们提出了一种新颖的、具有成本效益的显著目标检测框架,该框架可以借助有限数量的主动选择标注,使模型从合成数据适应到真实世界数据。具体来说,我们首先通过将前景对象复制粘贴到纯背景图像中来构建一个合成SOD数据集。以前景对象的掩码作为真实的显著图,该数据集可用于初步训练SOD模型。然而,由于合成图像和真实世界图像之间存在较大的域差距,初始训练模型在真实世界图像上的性能较差。为了将模型从合成数据集转移到真实世界数据集,我们进一步设计了一种不确定性感知的主动域自适应算法,为真实世界目标图像生成标签。利用针对数据增强的预测方差来计算超像素级的不确定性值。对于那些不确定性相对较低的超像素,我们直接根据网络预测生成伪标签。同时,我们选择一些不确定性得分高的超像素并手动为它们分配标签。这种标注策略能够在不产生过多标注成本的情况下生成高质量标签。在六个基准SOD数据集上的实验结果表明,我们的方法优于现有的最先进的弱监督和无监督SOD方法,甚至与全监督方法相当。代码将在以下网址发布:https://github.com/czh-3/UADA 。