• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于全局上下文和对级别融合特征集成的人与对象交互检测。

Human-Object Interaction detection via Global Context and Pairwise-level Fusion Features Integration.

机构信息

Dalian University of Technology, Dalian, 116024, Liaoning, China.

出版信息

Neural Netw. 2024 Feb;170:242-253. doi: 10.1016/j.neunet.2023.11.002. Epub 2023 Nov 13.

DOI:10.1016/j.neunet.2023.11.002
PMID:37995546
Abstract

Recent two-stage detector-based methods show superiority in Human-Object Interaction (HOI) detection along with the successful application of transformer. However, these methods are limited to extracting the global contextual features through instance-level attention without considering the perspective of human-object interaction pairs, and the fusion enhancement of interaction pair features lacks further exploration. The human-object interaction pairs guiding global context extraction relative to instance guiding global context extraction more fully utilize the semantics between human-object pairs, which helps HOI recognition. To this end, we propose a two-stage Global Context and Pairwise-level Fusion Features Integration Network (GFIN) for HOI detection. Specifically, the first stage employs an object detector for instance feature extraction. The second stage aims to capture the semantic-rich visual information through the proposed three modules, Global Contextual Feature Extraction Encoder (GCE), Pairwise Interaction Query Decoder (PID), and Human-Object Pairwise-level Attention Fusion Module (HOF). The GCE module intends to extract the global context memory by the proposed crossover-residual mechanism and then integrate it with the local instance memory from the DETR object detector. HOF utilizes the proposed pairwise-level attention mechanism to fuse and enhance the first stage's multi-layer feature. PID outputs multi-label interaction recognition results with the input of the query sequence from HOF and the memory from GCE. Finally, comprehensive experiments conducted on HICO-DET and V-COCO datasets demonstrate that the proposed GFIN significantly outperforms the state-of-the-art methods. Code is available at https://github.com/ddwhzh/GFIN.

摘要

基于两阶段检测器的方法在人体目标交互(HOI)检测中表现出优势,同时成功应用了 Transformer。然而,这些方法仅限于通过实例级注意力提取全局上下文特征,而没有考虑人体目标交互对的视角,并且对交互对特征的融合增强缺乏进一步的探索。与实例引导的全局上下文提取相比,引导全局上下文提取的人体目标交互对更充分地利用了人体目标对之间的语义,这有助于 HOI 识别。为此,我们提出了一种用于 HOI 检测的两阶段全局上下文和对级融合特征集成网络(GFIN)。具体来说,第一阶段使用目标检测器进行实例特征提取。第二阶段旨在通过我们提出的三个模块捕捉丰富的语义视觉信息,即全局上下文特征提取编码器(GCE)、对级交互查询解码器(PID)和人体目标对级注意力融合模块(HOF)。GCE 模块通过提出的交叉残差机制来提取全局上下文记忆,然后将其与来自 DETR 目标检测器的局部实例记忆融合。HOF 利用提出的对级注意力机制融合和增强第一阶段的多层特征。PID 利用来自 HOF 的查询序列和来自 GCE 的记忆作为输入,输出多标签交互识别结果。最后,在 HICO-DET 和 V-COCO 数据集上进行的综合实验表明,我们提出的 GFIN 显著优于最先进的方法。代码可在 https://github.com/ddwhzh/GFIN 上获得。

相似文献

1
Human-Object Interaction detection via Global Context and Pairwise-level Fusion Features Integration.基于全局上下文和对级别融合特征集成的人与对象交互检测。
Neural Netw. 2024 Feb;170:242-253. doi: 10.1016/j.neunet.2023.11.002. Epub 2023 Nov 13.
2
Pairwise CNN-Transformer Features for Human-Object Interaction Detection.用于人类与物体交互检测的成对卷积神经网络-Transformer特征
Entropy (Basel). 2024 Feb 27;26(3):205. doi: 10.3390/e26030205.
3
Point-Based Learnable Query Generator for Human-Object Interaction Detection.用于人机交互检测的基于点的可学习查询生成器
IEEE Trans Image Process. 2023;32:6469-6484. doi: 10.1109/TIP.2023.3334100. Epub 2023 Dec 1.
4
Learning Human-Object Interaction via Interactive Semantic Reasoning.通过交互式语义推理学习人机交互。
IEEE Trans Image Process. 2021;30:9294-9305. doi: 10.1109/TIP.2021.3125258. Epub 2021 Nov 12.
5
FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection.FGAHOI:用于人类与物体交互检测的细粒度锚点
IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2415-2429. doi: 10.1109/TPAMI.2023.3331738. Epub 2024 Mar 6.
6
ERNet: An Efficient and Reliable Human-Object Interaction Detection Network.ERNet:一种高效可靠的人-物交互检测网络。
IEEE Trans Image Process. 2023;32:964-979. doi: 10.1109/TIP.2022.3231528.
7
A Novel Part Refinement Tandem Transformer for Human-Object Interaction Detection.一种用于人机交互检测的新型部件细化串联变压器。
Sensors (Basel). 2024 Jul 1;24(13):4278. doi: 10.3390/s24134278.
8
Transferable Interactiveness Knowledge for Human-Object Interaction Detection.可迁移交互知识用于人机交互检测。
IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3870-3882. doi: 10.1109/TPAMI.2021.3054048. Epub 2022 Jun 3.
9
Toward a Unified Transformer-Based Framework for Scene Graph Generation and Human-Object Interaction Detection.面向场景图生成和人机交互检测的统一基于 Transformer 的框架。
IEEE Trans Image Process. 2023;32:6274-6288. doi: 10.1109/TIP.2023.3330304. Epub 2023 Nov 20.
10
Learning global dependencies and multi-semantics within heterogeneous graph for predicting disease-related lncRNAs.学习异质图中的全局依赖关系和多语义关系,以预测与疾病相关的 lncRNAs。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac361.