级联解析的人机交互识别。

Cascaded Parsing of Human-Object Interaction Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2827-2840. doi: 10.1109/TPAMI.2021.3049156. Epub 2022 May 5.

DOI:10.1109/TPAMI.2021.3049156

Abstract

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images. Considering the intrinsic complexity and structural nature of the task, we introduce a cascaded parsing network (CP-HOI) for a multi-stage, structured HOI understanding. At each cascade stage, an instance detection module progressively refines HOI proposals and feeds them into a structured interaction reasoning module. Each of the two modules is also connected to its predecessor in the previous stage, enabling efficient cross-stage information propagation. The structured interaction reasoning module is built upon a graph parsing neural network (GPNN), which efficiently models potential HOI structures as graphs and mines rich context for comprehensive relation understanding. In particular, GPNN infers a parse graph that i) interprets meaningful HOI structures by a learnable adjacency matrix, and ii) predicts action (edge) labels. Within an end-to-end, message-passing framework, GPNN blends learning and inference, iteratively parsing HOI structures and reasoning HOI representations (i.e., instance and relation features). Further beyond relation detection at a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. A preliminary version of our CP-HOI model reached 1 place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation. In addition, our CP-HOI shows promising results on two popular HOI recognition benchmarks, i.e., V-COCO and HICO-DET.

摘要

本文旨在解决图像中人体目标交互（HOI）的检测和识别问题。考虑到任务的内在复杂性和结构性质，我们引入了级联解析网络（CP-HOI），用于多阶段、结构化的 HOI 理解。在每个级联阶段，实例检测模块逐步细化 HOI 提案，并将其输入到结构化交互推理模块中。这两个模块中的每一个都与前一个阶段的前一个模块相连，从而实现有效的跨阶段信息传播。结构化交互推理模块基于图解析神经网络（GPNN）构建，该网络有效地将潜在的 HOI 结构建模为图，并挖掘丰富的上下文以进行全面的关系理解。特别是，GPNN 推断出一个解析图，该图通过可学习的邻接矩阵来解释有意义的 HOI 结构，并且 ii）预测动作（边）标签。在端到端的消息传递框架中，GPNN 融合了学习和推理，迭代地解析 HOI 结构和推理 HOI 表示（即实例和关系特征）。在边界框级别的关系检测之外，我们使我们的框架具有灵活性，可以执行更细粒度的像素级关系分割；这为更好的关系建模提供了新的视角。我们的 CP-HOI 模型的初步版本在 ICCV2019 上下文人物挑战赛中达到了 1 位，在关系检测和分割方面都取得了成绩。此外，我们的 CP-HOI 在两个流行的 HOI 识别基准（即 V-COCO 和 HICO-DET）上也显示出了有前途的结果。

相似文献

Cascaded Parsing of Human-Object Interaction Recognition.

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2827-2840. doi: 10.1109/TPAMI.2021.3049156. Epub 2022 May 5.

FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection.

IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2415-2429. doi: 10.1109/TPAMI.2023.3331738. Epub 2024 Mar 6.

Learning Human-Object Interaction via Interactive Semantic Reasoning.

IEEE Trans Image Process. 2021;30:9294-9305. doi: 10.1109/TIP.2021.3125258. Epub 2021 Nov 12.

Transferable Interactiveness Knowledge for Human-Object Interaction Detection.

IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3870-3882. doi: 10.1109/TPAMI.2021.3054048. Epub 2022 Jun 3.

A Novel Part Refinement Tandem Transformer for Human-Object Interaction Detection.

Sensors (Basel). 2024 Jul 1;24(13):4278. doi: 10.3390/s24134278.

IPGN: Interactiveness Proposal Graph Network for Human-Object Interaction Detection.

IEEE Trans Image Process. 2021;30:6583-6593. doi: 10.1109/TIP.2021.3096333. Epub 2021 Jul 21.

Human-Object Interaction detection via Global Context and Pairwise-level Fusion Features Integration.

Neural Netw. 2024 Feb;170:242-253. doi: 10.1016/j.neunet.2023.11.002. Epub 2023 Nov 13.

Toward a Unified Transformer-Based Framework for Scene Graph Generation and Human-Object Interaction Detection.

IEEE Trans Image Process. 2023;32:6274-6288. doi: 10.1109/TIP.2023.3330304. Epub 2023 Nov 20.

Point-Based Learnable Query Generator for Human-Object Interaction Detection.

IEEE Trans Image Process. 2023;32:6469-6484. doi: 10.1109/TIP.2023.3334100. Epub 2023 Dec 1.

Hierarchical Reasoning Network for Human-Object Interaction Detection.

IEEE Trans Image Process. 2021;30:8306-8317. doi: 10.1109/TIP.2021.3093784. Epub 2021 Oct 5.

引用本文的文献

PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.

PLoS One. 2025 Jun 25;20(6):e0326232. doi: 10.1371/journal.pone.0326232. eCollection 2025.

Intraretinal Layer Segmentation Using Cascaded Compressed U-Nets.

J Imaging. 2022 May 17;8(5):139. doi: 10.3390/jimaging8050139.

Congested Crowd Counting via Adaptive Multi-Scale Context Learning.

Sensors (Basel). 2021 May 29;21(11):3777. doi: 10.3390/s21113777.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

级联解析的人机交互识别。

Cascaded Parsing of Human-Object Interaction Recognition.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献