Pei Gensheng, Shen Fumin, Yao Yazhou, Chen Tao, Hua Xian-Sheng, Shen Heng-Tao
IEEE Trans Image Process. 2023;32:5909-5920. doi: 10.1109/TIP.2023.3326395. Epub 2023 Nov 1.
The optical flow guidance strategy is ideal for obtaining motion information of objects in the video. It is widely utilized in video segmentation tasks. However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene. The temporal consistency provided by the optical flow could be effectively supplemented by modeling in a structural form. This paper proposes a new hierarchical graph neural network (GNN) architecture, dubbed hierarchical graph pattern understanding (HGPU), for zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (i.e., optical flow) to enhance the high-order representations from the neighbors of target frames. Specifically, a hierarchical graph pattern encoder with message aggregation is introduced to acquire different levels of motion and appearance features in a sequential manner. Furthermore, a decoder is designed for hierarchically parsing and understanding the transformed multi-modal contexts to achieve more accurate and robust results. HGPU achieves state-of-the-art performance on four publicly available benchmarks (DAVIS-16, YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be found at https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU.
光流引导策略对于获取视频中物体的运动信息而言是理想的。它在视频分割任务中被广泛应用。然而,现有的基于光流的方法对光流有很大的依赖性,这导致在特定场景下光流估计失败时光流引导策略性能不佳。光流所提供的时间一致性可以通过结构化建模得到有效补充。本文提出了一种新的分层图神经网络(GNN)架构,称为分层图模式理解(HGPU),用于零样本视频对象分割(ZS-VOS)。受GNN在捕捉结构关系方面强大能力的启发,HGPU创新性地利用运动线索(即光流)来增强目标帧邻域的高阶表示。具体而言,引入了一种带有消息聚合的分层图模式编码器,以顺序方式获取不同层次的运动和外观特征。此外,设计了一个解码器,用于分层解析和理解变换后的多模态上下文,以获得更准确、更稳健的结果。HGPU在四个公开基准(DAVIS-16、YouTube-Objects、Long-Videos和DAVIS-17)上取得了领先的性能。代码和预训练模型可在https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU上找到。