Suppr超能文献

级联解析的人机交互识别。

Cascaded Parsing of Human-Object Interaction Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2827-2840. doi: 10.1109/TPAMI.2021.3049156. Epub 2022 May 5.

Abstract

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images. Considering the intrinsic complexity and structural nature of the task, we introduce a cascaded parsing network (CP-HOI) for a multi-stage, structured HOI understanding. At each cascade stage, an instance detection module progressively refines HOI proposals and feeds them into a structured interaction reasoning module. Each of the two modules is also connected to its predecessor in the previous stage, enabling efficient cross-stage information propagation. The structured interaction reasoning module is built upon a graph parsing neural network (GPNN), which efficiently models potential HOI structures as graphs and mines rich context for comprehensive relation understanding. In particular, GPNN infers a parse graph that i) interprets meaningful HOI structures by a learnable adjacency matrix, and ii) predicts action (edge) labels. Within an end-to-end, message-passing framework, GPNN blends learning and inference, iteratively parsing HOI structures and reasoning HOI representations (i.e., instance and relation features). Further beyond relation detection at a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. A preliminary version of our CP-HOI model reached 1 place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation. In addition, our CP-HOI shows promising results on two popular HOI recognition benchmarks, i.e., V-COCO and HICO-DET.

摘要

本文旨在解决图像中人体目标交互(HOI)的检测和识别问题。考虑到任务的内在复杂性和结构性质,我们引入了级联解析网络(CP-HOI),用于多阶段、结构化的 HOI 理解。在每个级联阶段,实例检测模块逐步细化 HOI 提案,并将其输入到结构化交互推理模块中。这两个模块中的每一个都与前一个阶段的前一个模块相连,从而实现有效的跨阶段信息传播。结构化交互推理模块基于图解析神经网络(GPNN)构建,该网络有效地将潜在的 HOI 结构建模为图,并挖掘丰富的上下文以进行全面的关系理解。特别是,GPNN 推断出一个解析图,该图通过可学习的邻接矩阵来解释有意义的 HOI 结构,并且 ii)预测动作(边)标签。在端到端的消息传递框架中,GPNN 融合了学习和推理,迭代地解析 HOI 结构和推理 HOI 表示(即实例和关系特征)。在边界框级别的关系检测之外,我们使我们的框架具有灵活性,可以执行更细粒度的像素级关系分割;这为更好的关系建模提供了新的视角。我们的 CP-HOI 模型的初步版本在 ICCV2019 上下文人物挑战赛中达到了 1 位,在关系检测和分割方面都取得了成绩。此外,我们的 CP-HOI 在两个流行的 HOI 识别基准(即 V-COCO 和 HICO-DET)上也显示出了有前途的结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验