• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于人类与物体交互检测的成对卷积神经网络-Transformer特征

Pairwise CNN-Transformer Features for Human-Object Interaction Detection.

作者信息

Quan Hutuo, Lai Huicheng, Gao Guxue, Ma Jun, Li Junkai, Chen Dongji

机构信息

College of Computer Science and Technology, Xinjiang University, Urumqi 830017, China.

Xinjiang Key Laboratory of Signal Detection and Processing, Xinjiang University, Urumqi 830017, China.

出版信息

Entropy (Basel). 2024 Feb 27;26(3):205. doi: 10.3390/e26030205.

DOI:10.3390/e26030205
PMID:38539717
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10969608/
Abstract

Human-object interaction (HOI) detection aims to localize and recognize the relationship between humans and objects, which helps computers understand high-level semantics. In HOI detection, two-stage and one-stage methods have distinct advantages and disadvantages. The two-stage methods can obtain high-quality human-object pair features based on object detection but lack contextual information. The one-stage transformer-based methods can model good global features but cannot benefit from object detection. The ideal model should have the advantages of both methods. Therefore, we propose the Pairwise Convolutional neural network (CNN)-Transformer (PCT), a simple and effective two-stage method. The model both fully utilizes the object detector and has rich contextual information. Specifically, we obtain pairwise CNN features from the CNN backbone. These features are fused with pairwise transformer features to enhance the pairwise representations. The enhanced representations are superior to using CNN and transformer features individually. In addition, the global features of the transformer provide valuable contextual cues. We fairly compare the performance of pairwise CNN and pairwise transformer features in HOI detection. The experimental results show that the previously neglected CNN features still have a significant edge. Compared to state-of-the-art methods, our model achieves competitive results on the HICO-DET and V-COCO datasets.

摘要

人与物体交互(HOI)检测旨在定位和识别人类与物体之间的关系,这有助于计算机理解高级语义。在HOI检测中,两阶段方法和一阶段方法各有优缺点。两阶段方法可以基于物体检测获得高质量的人与物体对特征,但缺乏上下文信息。基于一阶段变压器的方法可以对良好的全局特征进行建模,但无法从物体检测中受益。理想的模型应该兼具这两种方法的优点。因此,我们提出了成对卷积神经网络(CNN)-变压器(PCT),一种简单有效的两阶段方法。该模型既充分利用了物体检测器,又具有丰富的上下文信息。具体来说,我们从CNN主干中获得成对的CNN特征。这些特征与成对的变压器特征相融合,以增强成对表示。增强后的表示优于单独使用CNN和变压器特征。此外,变压器的全局特征提供了有价值的上下文线索。我们在HOI检测中公平地比较了成对CNN和成对变压器特征的性能。实验结果表明,先前被忽视的CNN特征仍然具有显著优势。与现有最先进的方法相比,我们的模型在HICO-DET和V-COCO数据集上取得了有竞争力的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/87199ade2463/entropy-26-00205-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/8a8878c49842/entropy-26-00205-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/f7999f4751cf/entropy-26-00205-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/a43e0ef96da5/entropy-26-00205-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/d60de0c856fc/entropy-26-00205-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/a1a0dc3a58bb/entropy-26-00205-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/cad21480bc1c/entropy-26-00205-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/6f1edf340610/entropy-26-00205-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/04476265a80a/entropy-26-00205-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/451590964b49/entropy-26-00205-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/87199ade2463/entropy-26-00205-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/8a8878c49842/entropy-26-00205-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/f7999f4751cf/entropy-26-00205-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/a43e0ef96da5/entropy-26-00205-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/d60de0c856fc/entropy-26-00205-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/a1a0dc3a58bb/entropy-26-00205-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/cad21480bc1c/entropy-26-00205-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/6f1edf340610/entropy-26-00205-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/04476265a80a/entropy-26-00205-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/451590964b49/entropy-26-00205-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0da0/10969608/87199ade2463/entropy-26-00205-g010.jpg

相似文献

1
Pairwise CNN-Transformer Features for Human-Object Interaction Detection.用于人类与物体交互检测的成对卷积神经网络-Transformer特征
Entropy (Basel). 2024 Feb 27;26(3):205. doi: 10.3390/e26030205.
2
Human-Object Interaction detection via Global Context and Pairwise-level Fusion Features Integration.基于全局上下文和对级别融合特征集成的人与对象交互检测。
Neural Netw. 2024 Feb;170:242-253. doi: 10.1016/j.neunet.2023.11.002. Epub 2023 Nov 13.
3
CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking.CVTrack:用于视觉跟踪的卷积神经网络与视觉Transformer融合模型
Sensors (Basel). 2024 Jan 3;24(1):274. doi: 10.3390/s24010274.
4
A Novel Part Refinement Tandem Transformer for Human-Object Interaction Detection.一种用于人机交互检测的新型部件细化串联变压器。
Sensors (Basel). 2024 Jul 1;24(13):4278. doi: 10.3390/s24134278.
5
Toward a Unified Transformer-Based Framework for Scene Graph Generation and Human-Object Interaction Detection.面向场景图生成和人机交互检测的统一基于 Transformer 的框架。
IEEE Trans Image Process. 2023;32:6274-6288. doi: 10.1109/TIP.2023.3330304. Epub 2023 Nov 20.
6
Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection.边缘保持和多尺度上下文神经网络的显著目标检测。
IEEE Trans Image Process. 2018;27(1):121-134. doi: 10.1109/TIP.2017.2756825.
7
FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection.FGAHOI:用于人类与物体交互检测的细粒度锚点
IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2415-2429. doi: 10.1109/TPAMI.2023.3331738. Epub 2024 Mar 6.
8
Point-Based Learnable Query Generator for Human-Object Interaction Detection.用于人机交互检测的基于点的可学习查询生成器
IEEE Trans Image Process. 2023;32:6469-6484. doi: 10.1109/TIP.2023.3334100. Epub 2023 Dec 1.
9
Dual encoder network with transformer-CNN for multi-organ segmentation.基于 Transformer-CNN 的双编码器网络的多器官分割。
Med Biol Eng Comput. 2023 Mar;61(3):661-671. doi: 10.1007/s11517-022-02723-9. Epub 2022 Dec 29.
10
Grounding human-object interaction to affordance behavior in multimodal datasets.将人机交互基于多模态数据集中的可供性(affordance)行为。
Front Artif Intell. 2023 Jan 30;6:1084740. doi: 10.3389/frai.2023.1084740. eCollection 2023.

本文引用的文献

1
Infrared Image Caption Based on Object-Oriented Attention.基于面向对象注意力的红外图像字幕
Entropy (Basel). 2023 May 22;25(5):826. doi: 10.3390/e25050826.
2
Driving Behavior Recognition Algorithm Combining Attention Mechanism and Lightweight Network.结合注意力机制与轻量级网络的驾驶行为识别算法
Entropy (Basel). 2022 Jul 16;24(7):984. doi: 10.3390/e24070984.
3
Optical Flow-Aware-Based Multi-Modal Fusion Network for Violence Detection.基于光流感知的多模态融合暴力检测网络
Entropy (Basel). 2022 Jul 6;24(7):939. doi: 10.3390/e24070939.
4
Deep Feature Space: A Geometrical Perspective.深度特征空间:几何视角。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6823-6838. doi: 10.1109/TPAMI.2021.3094625. Epub 2022 Sep 14.
5
Transferable Interactiveness Knowledge for Human-Object Interaction Detection.可迁移交互知识用于人机交互检测。
IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3870-3882. doi: 10.1109/TPAMI.2021.3054048. Epub 2022 Jun 3.