Wang Yangtao, Shen Xi, Yuan Yuan, Du Yuming, Li Maomao, Hu Shell Xu, Crowley James L, Vaufreydaz Dominique
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15790-15801. doi: 10.1109/TPAMI.2023.3305122. Epub 2023 Nov 3.
In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, in which the edge between each pair of patches is labeled with a similarity score based on the features learned by the transformer. Detection and segmentation of salient objects can then be formulated as a graph-cut problem and solved using the classical Normalized Cut algorithm. Despite the simplicity of this approach, it achieves state-of-the-art results on several common image and video detection and segmentation tasks. For unsupervised object discovery, this approach outperforms the competing approaches by a margin of 6.1%, 5.7%, and 2.6% when tested with the VOC07, VOC12, and COCO20 K datasets. For the unsupervised saliency detection task in images, this method improves the score for Intersection over Union (IoU) by 4.4%, 5.6% and 5.2%. When tested with the ECSSD, DUTS, and DUT-OMRON datasets. This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets.
在本文中,我们描述了一种基于图的算法,该算法使用自监督变换器获得的特征来检测和分割图像及视频中的显著物体。通过这种方法,组成图像或视频的图像块被组织成一个全连接图,其中每对图像块之间的边根据变换器学习到的特征用相似度得分进行标记。然后,显著物体的检测和分割可以被表述为一个图割问题,并使用经典的归一化割算法来解决。尽管这种方法很简单,但它在几个常见的图像和视频检测及分割任务上取得了领先的成果。对于无监督目标发现,当使用VOC07、VOC12和COCO20K数据集进行测试时,该方法比竞争方法分别高出6.1%、5.7%和2.6%。对于图像中的无监督显著性检测任务,当使用ECSSD、DUTS和DUT-OMRON数据集进行测试时,该方法将交并比(IoU)得分分别提高了4.4%、5.6%和5.2%。在使用DAVIS、SegTV2和FBMS数据集进行无监督视频目标分割任务测试时,该方法也取得了具有竞争力的结果。