Suppr超能文献

超越表象:用于高效且稳健视频对象分割的多帧时空上下文记忆网络

Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation.

作者信息

Dang Jisheng, Zheng Huicheng, Xu Xiaohao, Wang Longguang, Guo Yulan

出版信息

IEEE Trans Image Process. 2024;33:4853-4866. doi: 10.1109/TIP.2024.3423390. Epub 2024 Sep 5.

Abstract

Current video object segmentation approaches primarily rely on frame-wise appearance information to perform matching. Despite significant progress, reliable matching becomes challenging due to rapid changes of the object's appearance over time. Moreover, previous matching mechanisms suffer from redundant computation and noise interference as the number of accumulated frames increases. In this paper, we introduce a multi-frame spatio-temporal context memory (STCM) network to exploit discriminative spatio-temporal cues in multiple adjacent frames by utilizing a multi-frame context interaction module (MCI) for memory construction. Based on the proposed MCI module, a sparse group memory reader is developed to enable efficient sparse matching during memory reading. Our proposed method is generic and achieves state-of-the-art performance with real-time speed on benchmark datasets such as DAVIS and YouTube-VOS. In addition, our model exhibits robustness to sparse videos with low frame rates.

摘要

当前的视频对象分割方法主要依靠逐帧外观信息来进行匹配。尽管取得了显著进展,但由于对象外观随时间的快速变化,可靠的匹配变得具有挑战性。此外,随着累积帧数的增加,先前的匹配机制会受到冗余计算和噪声干扰的影响。在本文中,我们引入了一种多帧时空上下文记忆(STCM)网络,通过利用多帧上下文交互模块(MCI)进行记忆构建,来利用多个相邻帧中的判别性时空线索。基于所提出的MCI模块,开发了一种稀疏组记忆读取器,以在记忆读取期间实现高效的稀疏匹配。我们提出的方法具有通用性,并在DAVIS和YouTube-VOS等基准数据集上以实时速度实现了领先的性能。此外,我们的模型对低帧率的稀疏视频具有鲁棒性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验