Li Guorong, Hong Dexiang, Xu Kai, Zhong Bineng, Su Li, Han Zhenjun, Huang Qingming
IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7671-7684. doi: 10.1109/TNNLS.2022.3219936. Epub 2024 Jun 3.
Recently, self-supervised video object segmentation (VOS) has attracted much interest. However, most proxy tasks are proposed to train only a single backbone, which relies on a point-to-point correspondence strategy to propagate masks through a video sequence. Due to its simple pipeline, the performance of the single backbone paradigm is still unsatisfactory. Instead of following the previous literature, we propose our self-supervised progressive network (SSPNet) which consists of a memory retrieval module (MRM) and collaborative refinement module (CRM). The MRM can perform point-to-point correspondence and produce a propagated coarse mask for a query frame through self-supervised pixel-level and frame-level similarity learning. The CRM, which is trained via cycle consistency region tracking, aggregates the reference & query information and learns the collaborative relationship among them implicitly to refine the coarse mask. Furthermore, to learn semantic knowledge from unlabeled data, we also design two novel mask-generation strategies to provide the training data with meaningful semantic information for the CRM. Extensive experiments conducted on DAVIS-17, YouTube- VOS and SegTrack v2 demonstrate that our method surpasses the state-of-the-art self-supervised methods and narrows the gap with the fully supervised methods.
最近,自监督视频对象分割(VOS)引起了广泛关注。然而,大多数代理任务仅用于训练单个主干网络,该主干网络依靠点对点对应策略在视频序列中传播掩码。由于其简单的流程,单个主干网络范式的性能仍然不尽人意。与以往文献不同,我们提出了自监督渐进网络(SSPNet),它由一个内存检索模块(MRM)和协作细化模块(CRM)组成。MRM可以执行点对点对应,并通过自监督像素级和帧级相似性学习为查询帧生成传播的粗掩码。CRM通过循环一致性区域跟踪进行训练,聚合参考信息和查询信息,并隐式学习它们之间的协作关系以细化粗掩码。此外,为了从未标记数据中学习语义知识,我们还设计了两种新颖的掩码生成策略,为CRM提供具有有意义语义信息的训练数据。在DAVIS-17、YouTube-VOS和SegTrack v2上进行的大量实验表明,我们的方法超越了当前最先进的自监督方法,并缩小了与全监督方法的差距。