用于无监督视频对象分割的运动和时间线索学习

Zhuge Yunzhi, Gu Hongyu, Zhang Lu, Qi Jinqing, Lu Huchuan

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9084-9097. doi: 10.1109/TNNLS.2024.3418980. Epub 2025 May 2.

In this article, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation. To capture the intricate long-range contextual dynamics and information embedded within videos, a temporal transformer module is introduced, facilitating efficacious interframe interactions throughout a video clip. Furthermore, we employ a cascade of decoders all feature levels across all feature levels to optimally exploit the derived features, aiming to generate increasingly precise segmentation masks. As a result, MTNet provides a strong and compact framework that explores both temporal and cross-modality knowledge to robustly localize and track the primary object accurately in various challenging scenarios efficiently. Extensive experiments across diverse benchmarks conclusively show that our method not only attains state-of-the-art performance in UVOS but also delivers competitive results in video salient object detection (VSOD). These findings highlight the method's robust versatility and its adeptness in adapting to a range of segmentation tasks. The source code is available at https://github.com/hy0523/MTNet.

在本文中，我们提出了一种名为MTNet的高效算法来应对无监督视频对象分割（UVOS）中的挑战，该算法同时利用运动和时间线索。与以往仅专注于将外观与运动相结合或对时间关系进行建模的方法不同，我们的方法通过将这两个方面整合在一个统一的框架中来进行结合。MTNet是通过在编码器的特征提取过程中有效地融合外观和运动特征而设计的，从而促进了更具互补性的表示。为了捕捉视频中复杂的长距离上下文动态和信息，引入了一个时间变压器模块，以促进整个视频片段中有效的帧间交互。此外，我们在所有特征级别上采用了一系列解码器，以最佳地利用派生特征，旨在生成越来越精确的分割掩码。结果，MTNet提供了一个强大而紧凑的框架，该框架探索时间和跨模态知识，以便在各种具有挑战性的场景中高效地稳健定位和准确跟踪主要对象。在各种基准上进行的广泛实验最终表明，我们的方法不仅在UVOS中达到了当前的最佳性能，而且在视频显著对象检测（VSOD）中也取得了有竞争力的结果。这些发现突出了该方法强大的通用性及其在适应一系列分割任务方面的熟练程度。源代码可在https://github.com/hy0523/MTNet获取。

相似文献

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation.

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9084-9097. doi: 10.1109/TNNLS.2024.3418980. Epub 2025 May 2.

Unsupervised Online Video Object Segmentation With Motion Property Understanding.

IEEE Trans Image Process. 2020;29:237-249. doi: 10.1109/TIP.2019.2930152. Epub 2019 Jul 26.

Paying Attention to Video Object Pattern Understanding.

IEEE Trans Pattern Anal Mach Intell. 2021 Jul;43(7):2413-2428. doi: 10.1109/TPAMI.2020.2966453. Epub 2021 Jun 8.

Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection.

IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10663-10673. doi: 10.1109/TNNLS.2023.3243246. Epub 2024 Aug 5.

DVIS++: Improved Decoupled Framework for Universal Video Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5918-5929. doi: 10.1109/TPAMI.2025.3552694.

Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation.

IEEE Trans Image Process. 2023;32:5909-5920. doi: 10.1109/TIP.2023.3326395. Epub 2023 Nov 1.

Language-Aware Vision Transformer for Referring Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5238-5255. doi: 10.1109/TPAMI.2024.3468640.

Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation.

IEEE Trans Image Process. 2023;32:2348-2359. doi: 10.1109/TIP.2023.3267244. Epub 2023 Apr 25.

Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks.

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2228-2242. doi: 10.1109/TPAMI.2020.3040258. Epub 2022 Mar 4.

Video Question Answering With Prior Knowledge and Object-Sensitive Learning.

IEEE Trans Image Process. 2022;31:5936-5948. doi: 10.1109/TIP.2022.3205212. Epub 2022 Sep 15.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation.

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9084-9097. doi: 10.1109/TNNLS.2024.3418980. Epub 2025 May 2.

Unsupervised Online Video Object Segmentation With Motion Property Understanding.

IEEE Trans Image Process. 2020;29:237-249. doi: 10.1109/TIP.2019.2930152. Epub 2019 Jul 26.

Paying Attention to Video Object Pattern Understanding.

IEEE Trans Pattern Anal Mach Intell. 2021 Jul;43(7):2413-2428. doi: 10.1109/TPAMI.2020.2966453. Epub 2021 Jun 8.

Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection.

IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10663-10673. doi: 10.1109/TNNLS.2023.3243246. Epub 2024 Aug 5.

DVIS++: Improved Decoupled Framework for Universal Video Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5918-5929. doi: 10.1109/TPAMI.2025.3552694.

Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation.

IEEE Trans Image Process. 2023;32:5909-5920. doi: 10.1109/TIP.2023.3326395. Epub 2023 Nov 1.

Language-Aware Vision Transformer for Referring Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5238-5255. doi: 10.1109/TPAMI.2024.3468640.

Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation.

IEEE Trans Image Process. 2023;32:2348-2359. doi: 10.1109/TIP.2023.3267244. Epub 2023 Apr 25.

Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks.

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2228-2242. doi: 10.1109/TPAMI.2020.3040258. Epub 2022 Mar 4.

Video Question Answering With Prior Knowledge and Object-Sensitive Learning.

IEEE Trans Image Process. 2022;31:5936-5948. doi: 10.1109/TIP.2022.3205212. Epub 2022 Sep 15.

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献