Haller Emanuela, Florea Adina Magda, Leordeanu Marius
IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7638-7656. doi: 10.1109/TPAMI.2021.3120228. Epub 2022 Oct 4.
We propose a dual system for unsupervised object segmentation in video, which brings together two modules with complementary properties: a space-time graph that discovers objects in videos and a deep network that learns powerful object features. The system uses an iterative knowledge exchange policy. A novel spectral space-time clustering process on the graph produces unsupervised segmentation masks passed to the network as pseudo-labels. The net learns to segment in single frames what the graph discovers in video and passes back to the graph strong image-level features that improve its node-level features in the next iteration. Knowledge is exchanged for several cycles until convergence. The graph has one node per each video pixel, but the object discovery is fast. It uses a novel power iteration algorithm computing the main space-time cluster as the principal eigenvector of a special Feature-Motion matrix without actually computing the matrix. The thorough experimental analysis validates our theoretical claims and proves the effectiveness of the cyclical knowledge exchange. We also perform experiments on the supervised scenario, incorporating features pretrained with human supervision. We achieve state-of-the-art level on unsupervised and supervised scenarios on four challenging datasets: DAVIS, SegTrack, YouTube-Objects, and DAVSOD. We will make our code publicly available.
我们提出了一种用于视频中无监督目标分割的双重系统,该系统将两个具有互补特性的模块结合在一起:一个用于发现视频中目标的时空图,以及一个用于学习强大目标特征的深度网络。该系统采用迭代知识交换策略。图上一种新颖的谱时空聚类过程产生无监督分割掩码,并作为伪标签传递给网络。网络学习在单帧中分割图在视频中发现的内容,并将强大的图像级特征反馈给图,以在下次迭代中改进其节点级特征。知识交换进行多个循环直至收敛。图中每个视频像素都有一个节点,但目标发现速度很快。它使用一种新颖的幂迭代算法,通过计算特殊特征 - 运动矩阵的主特征向量来计算主要时空聚类,而无需实际计算该矩阵。全面的实验分析验证了我们的理论主张,并证明了循环知识交换的有效性。我们还在有监督的场景下进行了实验,纳入了在人类监督下预训练的特征。我们在四个具有挑战性的数据集DAVIS、SegTrack、YouTube - Objects和DAVSOD上的无监督和有监督场景中达到了当前最优水平。我们将公开我们的代码。