无时间信息的视频对象分割

Video Object Segmentation without Temporal Information.

作者信息

Maninis K-K, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixe L, Cremers D, Van Gool L

出版信息

IEEE Trans Pattern Anal Mach Intell. 2019 Jun;41(6):1515-1530. doi: 10.1109/TPAMI.2018.2838670. Epub 2018 May 23.

DOI:10.1109/TPAMI.2018.2838670

Abstract

Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly. This paper explores the orthogonal approach of processing each frame independently, i.e., disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOS$^\mathrm {S}$S), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance-level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent single-object video segmentation databases, which show that OSVOS$^\mathrm {S}$S is both the fastest and most accurate method in the state of the art. Experiments on multi-object video segmentation show that OSVOS$^\mathrm {S}$S obtains competitive results.

摘要

视频对象分割以及一般的视频处理，在历史上一直由依赖于连续视频帧中的时间一致性和冗余性的方法主导。当时间平滑性突然被打破时，例如当一个物体被遮挡，或者序列中某些帧缺失时，这些方法的结果会显著恶化。本文探索了独立处理每一帧的正交方法，即忽略时间信息。具体而言，它解决了半监督视频对象分割的任务：在给定视频第一帧中物体掩码的情况下，将物体从视频背景中分离出来。我们提出了语义一次性视频对象分割（OSVOS$^\mathrm {S}$S），它基于一个全卷积神经网络架构，该架构能够将在ImageNet上学习到的通用语义信息依次转移到前景分割任务中，并最终学习测试序列中单个标注物体的外观（因此是一次性）。我们表明，当有效结合实例级语义信息时，可以显著改善我们之前的方法OSVOS的结果。我们在两个最近的单对象视频分割数据库上进行了实验，结果表明OSVOS$^\mathrm {S}$S是目前最快速且最准确的方法。多对象视频分割实验表明，OSVOS$^\mathrm {S}$S也取得了有竞争力的结果。

相似文献

Video Object Segmentation without Temporal Information.

IEEE Trans Pattern Anal Mach Intell. 2019 Jun;41(6):1515-1530. doi: 10.1109/TPAMI.2018.2838670. Epub 2018 May 23.

Segmentation in Weakly Labeled Videos via a Semantic Ranking and Optical Warping Network.

IEEE Trans Image Process. 2018 May 16. doi: 10.1109/TIP.2018.2834221.

Video Object Discovery and Co-Segmentation with Extremely Weak Supervision.

IEEE Trans Pattern Anal Mach Intell. 2017 Oct;39(10):2074-2088. doi: 10.1109/TPAMI.2016.2612187. Epub 2016 Oct 26.

Motion-Guided Cascaded Refinement Network for Video Object Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2020 Aug;42(8):1957-1967. doi: 10.1109/TPAMI.2019.2906175. Epub 2019 Mar 19.

Coarse-to-Fine Semantic Segmentation From Image-Level Labels.

IEEE Trans Image Process. 2020;29:225-236. doi: 10.1109/TIP.2019.2926748. Epub 2019 Jul 12.

Online Meta Adaptation for Fast Video Object Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1205-1217. doi: 10.1109/TPAMI.2018.2890659. Epub 2019 Jan 14.

Learning to Segment Human by Watching YouTube.

IEEE Trans Pattern Anal Mach Intell. 2017 Jul;39(7):1462-1468. doi: 10.1109/TPAMI.2016.2598340. Epub 2016 Aug 5.

Prototypical Matching Networks for Video Object Segmentation.

IEEE Trans Image Process. 2023;32:5623-5636. doi: 10.1109/TIP.2023.3321462. Epub 2023 Oct 17.

Joint Video Object Discovery and Segmentation by Coupled Dynamic Markov Networks.

IEEE Trans Image Process. 2018 Dec;27(12):5840-5853. doi: 10.1109/TIP.2018.2859622. Epub 2018 Jul 30.

Language-Aware Spatial-Temporal Collaboration for Referring Video Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8646-8659. doi: 10.1109/TPAMI.2023.3235720. Epub 2023 Jun 5.

引用本文的文献

Click to Correction: Interactive Bidirectional Dynamic Propagation Video Object Segmentation Network.

Sensors (Basel). 2024 Oct 2;24(19):6405. doi: 10.3390/s24196405.

Convolutional neural network approach for the automated identification of crystals.

J Appl Crystallogr. 2024 Feb 23;57(Pt 2):266-275. doi: 10.1107/S1600576724000682. eCollection 2024 Apr 1.

An Interactive Image Segmentation Method Based on Multi-Level Semantic Fusion.

Sensors (Basel). 2023 Jul 14;23(14):6394. doi: 10.3390/s23146394.

Deep learning kidney segmentation with very limited training data using a cascaded convolution neural network.

PLoS One. 2022 May 9;17(5):e0267753. doi: 10.1371/journal.pone.0267753. eCollection 2022.

TRex, a fast multi-animal tracking system with markerless identification, and 2D estimation of posture and visual fields.

Elife. 2021 Feb 26;10:e64000. doi: 10.7554/eLife.64000.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

无时间信息的视频对象分割

Video Object Segmentation without Temporal Information.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献