IEEE Trans Image Process. 2021;30:6583-6593. doi: 10.1109/TIP.2021.3096333. Epub 2021 Jul 21.
Human-Object Interaction (HOI) Detection is an important task to understand how humans interact with objects. Most of the existing works treat this task as an exhaustive triplet 〈 human, verb, object 〉 classification problem. In this paper, we decompose it and propose a novel two-stage graph model to learn the knowledge of interactiveness and interaction in one network, namely, Interactiveness Proposal Graph Network (IPGN). In the first stage, we design a fully connected graph for learning the interactiveness, which distinguishes whether a pair of human and object is interactive or not. Concretely, it generates the interactiveness features to encode high-level semantic interactiveness knowledge for each pair. The class-agnostic interactiveness is a more general and simpler objective, which can be used to provide reasonable proposals for the graph construction in the second stage. In the second stage, a sparsely connected graph is constructed with all interactive pairs selected by the first stage. Specifically, we use the interactiveness knowledge to guide the message passing. By contrast with the feature similarity, it explicitly represents the connections between the nodes. Benefiting from the valid graph reasoning, the node features are well encoded for interaction learning. Experiments show that the proposed method achieves state-of-the-art performance on both V-COCO and HICO-DET datasets.
人机交互 (HOI) 检测是理解人类如何与物体交互的重要任务。现有的大多数工作将此任务视为一个详尽的三元组〈人、动词、物体〉分类问题。在本文中,我们对其进行了分解,并提出了一种新颖的两阶段图模型,以便在一个网络中学习交互和交互的知识,即交互提议图网络 (IPGN)。在第一阶段,我们设计了一个全连接图来学习交互性,它区分了人与物体是否相互作用。具体来说,它生成交互性特征,为每一对编码高级语义交互性知识。无类别交互性是一个更通用和更简单的目标,可以为第二阶段的图构建提供合理的建议。在第二阶段,使用第一阶段选择的所有交互对构建稀疏连接图。具体来说,我们使用交互性知识来指导消息传递。与特征相似性相比,它明确地表示了节点之间的连接。受益于有效的图推理,节点特征被很好地编码以进行交互学习。实验表明,该方法在 V-COCO 和 HICO-DET 数据集上均达到了最新水平。