SpVOS：基于三重稀疏卷积的高效视频对象分割

SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution.

作者信息

Lin Weihao, Chen Tao, Yu Chong

出版信息

IEEE Trans Image Process. 2023;32:5977-5991. doi: 10.1109/TIP.2023.3327588. Epub 2023 Nov 7.

DOI:10.1109/TIP.2023.3327588

Abstract

Semi-supervised video object segmentation (Semi-VOS), which requires only annotating the first frame of a video to segment future frames, has received increased attention recently. Among existing Semi-VOS pipelines, the memory-matching-based one is becoming the main research stream, as it can fully utilize the temporal sequence information to obtain high-quality segmentation results. Even though this type of method has achieved promising performance, the overall framework still suffers from heavy computation overhead, mainly caused by the per-frame dense convolution operations between high-resolution feature maps and each kernel filter. Therefore, we propose a sparse baseline of VOS named SpVOS in this work, which develops a novel triple sparse convolution to reduce the computation costs of the overall VOS framework. The designed triple gate, taking full consideration of both spatial and temporal redundancy between adjacent video frames, adaptively makes a triple decision to decide how to apply the sparse convolution on each pixel to control the computation overhead of each layer, while maintaining sufficient discrimination capability to distinguish similar objects and avoid error accumulation. A mixed sparse training strategy, coupled with a designed objective considering the sparsity constraint, is also developed to balance the VOS segmentation performance and computation costs. Experiments are conducted on two mainstream VOS datasets, including DAVIS and Youtube-VOS. Results show that, the proposed SpVOS achieves superior performance over other state-of-the-art sparse methods, and even maintains comparable performance, e.g., an 83.04% (79.29%) overall score on the DAVIS-2017 (Youtube-VOS) validation set, with the typical non-sparse VOS baseline (82.88% for DAVIS-2017 and 80.36% for Youtube-VOS) while saving up to 42% FLOPs, showing its application potential for resource-constrained scenarios.

摘要

半监督视频对象分割（Semi-VOS），只需要标注视频的第一帧就能分割后续帧，最近受到了越来越多的关注。在现有的Semi-VOS流程中，基于内存匹配的流程正成为主要的研究方向，因为它可以充分利用时间序列信息来获得高质量的分割结果。尽管这类方法已经取得了不错的性能，但整体框架仍然存在大量的计算开销，主要是由高分辨率特征图和每个内核滤波器之间的逐帧密集卷积操作导致的。因此，我们在这项工作中提出了一种名为SpVOS的VOS稀疏基线，它开发了一种新颖的三重稀疏卷积来降低整个VOS框架的计算成本。设计的三重门充分考虑了相邻视频帧之间的空间和时间冗余，自适应地做出三重决策，以决定如何在每个像素上应用稀疏卷积来控制每一层的计算开销，同时保持足够的辨别能力来区分相似对象并避免误差积累。还开发了一种混合稀疏训练策略，结合考虑稀疏约束的设计目标，以平衡VOS分割性能和计算成本。在包括DAVIS和Youtube-VOS在内的两个主流VOS数据集上进行了实验。结果表明，所提出的SpVOS比其他现有的稀疏方法具有更优的性能，甚至保持了可比的性能，例如在DAVIS-2017（Youtube-VOS）验证集上的总体得分达到83.04%（79.29%），与典型的非稀疏VOS基线（DAVIS-2017为82.88%，Youtube-VOS为80.36%）相当，同时节省了高达42%的浮点运算量，显示了其在资源受限场景中的应用潜力。

相似文献

SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution.SpVOS：基于三重稀疏卷积的高效视频对象分割

IEEE Trans Image Process. 2023;32:5977-5991. doi: 10.1109/TIP.2023.3327588. Epub 2023 Nov 7.

Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels.

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2595-2612. doi: 10.1109/TPAMI.2022.3163375. Epub 2023 Jan 6.

Region Aware Video Object Segmentation With Deep Motion Modeling.基于深度运动建模的区域感知视频对象分割

IEEE Trans Image Process. 2024;33:2639-2651. doi: 10.1109/TIP.2024.3381445. Epub 2024 Apr 3.

Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation.用于高效且稳健视频对象分割的自适应稀疏记忆网络

IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):3820-3833. doi: 10.1109/TNNLS.2024.3357118. Epub 2025 Feb 6.

Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation.超越表象：用于高效且稳健视频对象分割的多帧时空上下文记忆网络

IEEE Trans Image Process. 2024;33:4853-4866. doi: 10.1109/TIP.2024.3423390. Epub 2024 Sep 5.

Scalable Video Object Segmentation With Identification Mechanism.具有识别机制的可扩展视频对象分割

IEEE Trans Pattern Anal Mach Intell. 2024 Sep;46(9):6247-6262. doi: 10.1109/TPAMI.2024.3383592. Epub 2024 Aug 6.

Adaptive Selection of Reference Frames for Video Object Segmentation.用于视频对象分割的参考帧自适应选择

IEEE Trans Image Process. 2022;31:1057-1071. doi: 10.1109/TIP.2021.3137660. Epub 2022 Jan 19.

Directional Deep Embedding and Appearance Learning for Fast Video Object Segmentation.用于快速视频对象分割的定向深度嵌入与外观学习

IEEE Trans Neural Netw Learn Syst. 2022 Aug;33(8):3884-3894. doi: 10.1109/TNNLS.2021.3054769. Epub 2022 Aug 3.

Reliability-Guided Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation.

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7514-7528. doi: 10.1109/TNNLS.2024.3389008. Epub 2025 Apr 4.

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration.基于多尺度前景-背景融合的协同视频对象分割

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4701-4712. doi: 10.1109/TPAMI.2021.3081597. Epub 2022 Aug 4.

SpVOS：基于三重稀疏卷积的高效视频对象分割

SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution.

作者信息

Lin Weihao, Chen Tao, Yu Chong

出版信息

IEEE Trans Image Process. 2023;32:5977-5991. doi: 10.1109/TIP.2023.3327588. Epub 2023 Nov 7.

DOI:10.1109/TIP.2023.3327588

PMID:37906477

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

SpVOS：基于三重稀疏卷积的高效视频对象分割

SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution.

作者信息

出版信息

相似文献

SpVOS：基于三重稀疏卷积的高效视频对象分割

SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution.

作者信息

出版信息

相似文献