Suppr超能文献

用于快速视频对象分割的定向深度嵌入与外观学习

Directional Deep Embedding and Appearance Learning for Fast Video Object Segmentation.

作者信息

Yin Yingjie, Xu De, Wang Xingang, Zhang Lei

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3884-3894. doi: 10.1109/TNNLS.2021.3054769. Epub 2022 Aug 3.

Abstract

Most recent semisupervised video object segmentation (VOS) methods rely on fine-tuning deep convolutional neural networks online using the given mask of the first frame or predicted masks of subsequent frames. However, the online fine-tuning process is usually time-consuming, limiting the practical use of such methods. We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS. First, a global directional matching module (GDMM), which can be efficiently implemented by parallel convolutional operations, is proposed to learn a semantic pixel-wise embedding as an internal guidance. Second, an effective directional appearance model-based statistics is proposed to represent the target and background on a spherical embedding space for VOS. Equipped with the GDMM and the directional appearance model learning module, DDEAL learns static cues from the labeled first frame and dynamically updates cues of the subsequent frames for object segmentation. Our method exhibits the state-of-the-art VOS performance without using online fine-tuning. Specifically, it achieves a J & F mean score of 74.8% on DAVIS 2017 data set and an overall score G of 71.3% on the large-scale YouTube-VOS data set, while retaining a speed of 25 fps with a single NVIDIA TITAN Xp GPU. Furthermore, our faster version runs 31 fps with only a little accuracy loss.

摘要

最近的大多数半监督视频目标分割(VOS)方法依赖于使用第一帧的给定掩码或后续帧的预测掩码在线微调深度卷积神经网络。然而,在线微调过程通常很耗时,限制了此类方法的实际应用。我们提出了一种用于快速VOS的定向深度嵌入和外观学习(DDEAL)方法,该方法无需在线微调过程。首先,提出了一种全局定向匹配模块(GDMM),它可以通过并行卷积操作有效地实现,以学习语义像素级嵌入作为内部指导。其次,提出了一种基于有效定向外观模型的统计方法,用于在球形嵌入空间上表示VOS的目标和背景。配备GDMM和定向外观模型学习模块,DDEAL从标记的第一帧学习静态线索,并动态更新后续帧的线索以进行目标分割。我们的方法在不使用在线微调的情况下展现出了当前最优的VOS性能。具体而言,它在DAVIS 2017数据集上实现了74.8%的J&F平均分数,在大规模YouTube-VOS数据集上实现了71.3%的总体分数G,同时在单个NVIDIA TITAN Xp GPU上保持25帧/秒的速度。此外,我们的更快版本以仅略微的精度损失运行31帧/秒。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验