• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于无监督视频对象分割的运动和时间线索学习

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation.

作者信息

Zhuge Yunzhi, Gu Hongyu, Zhang Lu, Qi Jinqing, Lu Huchuan

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9084-9097. doi: 10.1109/TNNLS.2024.3418980. Epub 2025 May 2.

DOI:10.1109/TNNLS.2024.3418980
PMID:38976474
Abstract

In this article, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation. To capture the intricate long-range contextual dynamics and information embedded within videos, a temporal transformer module is introduced, facilitating efficacious interframe interactions throughout a video clip. Furthermore, we employ a cascade of decoders all feature levels across all feature levels to optimally exploit the derived features, aiming to generate increasingly precise segmentation masks. As a result, MTNet provides a strong and compact framework that explores both temporal and cross-modality knowledge to robustly localize and track the primary object accurately in various challenging scenarios efficiently. Extensive experiments across diverse benchmarks conclusively show that our method not only attains state-of-the-art performance in UVOS but also delivers competitive results in video salient object detection (VSOD). These findings highlight the method's robust versatility and its adeptness in adapting to a range of segmentation tasks. The source code is available at https://github.com/hy0523/MTNet.

摘要

在本文中,我们提出了一种名为MTNet的高效算法来应对无监督视频对象分割(UVOS)中的挑战,该算法同时利用运动和时间线索。与以往仅专注于将外观与运动相结合或对时间关系进行建模的方法不同,我们的方法通过将这两个方面整合在一个统一的框架中来进行结合。MTNet是通过在编码器的特征提取过程中有效地融合外观和运动特征而设计的,从而促进了更具互补性的表示。为了捕捉视频中复杂的长距离上下文动态和信息,引入了一个时间变压器模块,以促进整个视频片段中有效的帧间交互。此外,我们在所有特征级别上采用了一系列解码器,以最佳地利用派生特征,旨在生成越来越精确的分割掩码。结果,MTNet提供了一个强大而紧凑的框架,该框架探索时间和跨模态知识,以便在各种具有挑战性的场景中高效地稳健定位和准确跟踪主要对象。在各种基准上进行的广泛实验最终表明,我们的方法不仅在UVOS中达到了当前的最佳性能,而且在视频显著对象检测(VSOD)中也取得了有竞争力的结果。这些发现突出了该方法强大的通用性及其在适应一系列分割任务方面的熟练程度。源代码可在https://github.com/hy0523/MTNet获取。

相似文献

1
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation.用于无监督视频对象分割的运动和时间线索学习
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9084-9097. doi: 10.1109/TNNLS.2024.3418980. Epub 2025 May 2.
2
Unsupervised Online Video Object Segmentation With Motion Property Understanding.基于运动属性理解的无监督在线视频对象分割。
IEEE Trans Image Process. 2020;29:237-249. doi: 10.1109/TIP.2019.2930152. Epub 2019 Jul 26.
3
Paying Attention to Video Object Pattern Understanding.关注视频对象模式理解。
IEEE Trans Pattern Anal Mach Intell. 2021 Jul;43(7):2413-2428. doi: 10.1109/TPAMI.2020.2966453. Epub 2021 Jun 8.
4
Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection.用于视频显著目标检测的学习互补时空变换器
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10663-10673. doi: 10.1109/TNNLS.2023.3243246. Epub 2024 Aug 5.
5
DVIS++: Improved Decoupled Framework for Universal Video Segmentation.
IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5918-5929. doi: 10.1109/TPAMI.2025.3552694.
6
Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation.用于零样本视频对象分割的分层图模式理解
IEEE Trans Image Process. 2023;32:5909-5920. doi: 10.1109/TIP.2023.3326395. Epub 2023 Nov 1.
7
Language-Aware Vision Transformer for Referring Segmentation.用于指称分割的语言感知视觉Transformer
IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5238-5255. doi: 10.1109/TPAMI.2024.3468640.
8
Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation.层次化协同注意传播网络的零样本视频对象分割。
IEEE Trans Image Process. 2023;32:2348-2359. doi: 10.1109/TIP.2023.3267244. Epub 2023 Apr 25.
9
Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks.基于协同注意力暹罗网络的零样本视频目标分割
IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2228-2242. doi: 10.1109/TPAMI.2020.3040258. Epub 2022 Mar 4.
10
Video Question Answering With Prior Knowledge and Object-Sensitive Learning.基于先验知识和对象敏感学习的视频问答
IEEE Trans Image Process. 2022;31:5936-5948. doi: 10.1109/TIP.2022.3205212. Epub 2022 Sep 15.