Suppr超能文献

层次化协同注意传播网络的零样本视频对象分割。

Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation.

出版信息

IEEE Trans Image Process. 2023;32:2348-2359. doi: 10.1109/TIP.2023.3267244. Epub 2023 Apr 25.

Abstract

Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios. The common practice of introducing motion information, such as optical flow, can lead to overreliance on optical flow estimation. To address these challenges, we propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects. Specifically, our model is built upon multiple collaborative evolutions of the parallel co-attention module (PCM) and the cross co-attention module (CCM). PCM captures common foreground regions among adjacent appearance and motion features, while CCM further exploits and fuses cross-modal motion features returned by PCM. Our method is progressively trained to achieve hierarchical spatio-temporal feature propagation across the entire video. Experimental results demonstrate that our HCPN outperforms all previous methods on public benchmarks, showcasing its effectiveness for ZS-VOS. Code and pre-trained model can be found at https://github.com/NUST-Machine-Intelligence-Laboratory/HCPN.

摘要

零样本视频目标分割 (ZS-VOS) 旨在在没有这些对象的先验知识的情况下分割视频序列中的前景对象。然而,现有的 ZS-VOS 方法通常难以区分前景和背景,或者在复杂场景中难以跟踪前景。引入运动信息(例如光流)的常见做法可能导致过度依赖光流估计。为了解决这些挑战,我们提出了一种基于编码器-解码器的分层协同注意传播网络 (HCPN),能够跟踪和分割对象。具体来说,我们的模型是基于并行协同注意模块 (PCM) 和交叉协同注意模块 (CCM) 的多次协同进化构建的。PCM 捕获相邻外观和运动特征之间的公共前景区域,而 CCM 则进一步利用和融合 PCM 返回的跨模态运动特征。我们的方法是逐步训练的,以实现整个视频的分层时空特征传播。实验结果表明,我们的 HCPN 在公共基准上优于所有以前的方法,展示了其在 ZS-VOS 中的有效性。代码和预训练模型可以在 https://github.com/NUST-Machine-Intelligence-Laboratory/HCPN 上找到。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验