用于零样本视频对象分割的分层图模式理解

Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation.

作者信息

Pei Gensheng, Shen Fumin, Yao Yazhou, Chen Tao, Hua Xian-Sheng, Shen Heng-Tao

出版信息

IEEE Trans Image Process. 2023;32:5909-5920. doi: 10.1109/TIP.2023.3326395. Epub 2023 Nov 1.

DOI:10.1109/TIP.2023.3326395

Abstract

The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene. The temporal consistency provided by the optical flow could be effectively supplemented by modeling in a structural form. This paper proposes a new hierarchical graph neural network (GNN) architecture, dubbed hierarchical graph pattern understanding (HGPU), for zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (i.e., optical flow) to enhance the high-order representations from the neighbors of target frames. Specifically, a hierarchical graph pattern encoder with message aggregation is introduced to acquire different levels of motion and appearance features in a sequential manner. Furthermore, a decoder is designed for hierarchically parsing and understanding the transformed multi-modal contexts to achieve more accurate and robust results. HGPU achieves state-of-the-art performance on four publicly available benchmarks (DAVIS-16, YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be found at https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU.

摘要

光流引导策略对于获取视频中物体的运动信息而言是理想的。它在视频分割任务中被广泛应用。然而，现有的基于光流的方法对光流有很大的依赖性，这导致在特定场景下光流估计失败时光流引导策略性能不佳。光流所提供的时间一致性可以通过结构化建模得到有效补充。本文提出了一种新的分层图神经网络（GNN）架构，称为分层图模式理解（HGPU），用于零样本视频对象分割（ZS-VOS）。受GNN在捕捉结构关系方面强大能力的启发，HGPU创新性地利用运动线索（即光流）来增强目标帧邻域的高阶表示。具体而言，引入了一种带有消息聚合的分层图模式编码器，以顺序方式获取不同层次的运动和外观特征。此外，设计了一个解码器，用于分层解析和理解变换后的多模态上下文，以获得更准确、更稳健的结果。HGPU在四个公开基准（DAVIS-16、YouTube-Objects、Long-Videos和DAVIS-17）上取得了领先的性能。代码和预训练模型可在https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU上找到。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于零样本视频对象分割的分层图模式理解

Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation.

作者信息

出版信息

相似文献

用于零样本视频对象分割的分层图模式理解

Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation.

作者信息

出版信息

相似文献