Liu Xiyao, Ji Zhong, Pang Yanwei, Han Jungong, Li Xuelong
IEEE Trans Cybern. 2022 Aug;52(8):7852-7864. doi: 10.1109/TCYB.2021.3049537. Epub 2022 Jul 19.
Few-shot learning (FSL) for human-object interaction (HOI) aims at recognizing various relationships between human actions and surrounding objects only from a few samples. It is a challenging vision task, in which the diversity and interactivity of human actions result in great difficulty to learn an adaptive classifier to catch ambiguous interclass information. Therefore, traditional FSL methods usually perform unsatisfactorily in complex HOI scenes. To this end, we propose dynamic graph-in-graph networks (DGIG-Net), a novel graph prototypes framework to learn a dynamic metric space by embedding a visual subgraph to a task-oriented cross-modal graph for few-shot HOI. Specifically, we first build a knowledge reconstruction graph to learn latent representations for HOI categories by reconstructing the relationship among visual features, which generates visual representations under the category distribution of every task. Then, a dynamic relation graph integrates both reconstructible visual nodes and dynamic task-oriented semantic information to explore a graph metric space for HOI class prototypes, which applies the discriminative information from the similarities among actions or objects. We validate DGIG-Net on multiple benchmark datasets, on which it largely outperforms existing FSL approaches and achieves state-of-the-art results.
用于人类与物体交互(HOI)的少样本学习(FSL)旨在仅从少量样本中识别人类动作与周围物体之间的各种关系。这是一项具有挑战性的视觉任务,其中人类动作的多样性和交互性导致学习一个自适应分类器以捕捉模糊的类间信息变得极为困难。因此,传统的FSL方法在复杂的HOI场景中通常表现不佳。为此,我们提出了动态图中网络(DGIG-Net),这是一种新颖的图原型框架,通过将视觉子图嵌入到面向任务的跨模态图中,来学习用于少样本HOI的动态度量空间。具体而言,我们首先构建一个知识重建图,通过重建视觉特征之间的关系来学习HOI类别的潜在表示,这会在每个任务的类别分布下生成视觉表示。然后,一个动态关系图整合了可重建的视觉节点和面向任务的动态语义信息,以探索用于HOI类原型的图度量空间,该空间应用来自动作或物体之间相似性的判别信息。我们在多个基准数据集上对DGIG-Net进行了验证,在这些数据集上它大大优于现有的FSL方法,并取得了当前最优的结果。