Suppr超能文献

一种用于人机交互检测的新型部件细化串联变压器。

A Novel Part Refinement Tandem Transformer for Human-Object Interaction Detection.

作者信息

Su Zhan, Yang Hongzhe

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

出版信息

Sensors (Basel). 2024 Jul 1;24(13):4278. doi: 10.3390/s24134278.

Abstract

Human-object interaction (HOI) detection identifies a "set of interactions" in an image involving the recognition of interacting instances and the classification of interaction categories. The complexity and variety of image content make this task challenging. Recently, the Transformer has been applied in computer vision and received attention in the HOI detection task. Therefore, this paper proposes a novel Part Refinement Tandem Transformer (PRTT) for HOI detection. Unlike the previous Transformer-based HOI method, PRTT utilizes multiple decoders to split and process rich elements of HOI prediction and introduces a new part state feature extraction (PSFE) module to help improve the final interaction category classification. We adopt a novel prior feature integrated cross-attention (PFIC) to utilize the fine-grained partial state semantic and appearance feature output obtained by the PSFE module to guide queries. We validate our method on two public datasets, V-COCO and HICO-DET. Compared to state-of-the-art models, the performance of detecting human-object interaction is significantly improved by the PRTT.

摘要

人机交互(HOI)检测旨在识别图像中的“一组交互”,这涉及到对交互实例的识别以及交互类别的分类。图像内容的复杂性和多样性使得这项任务具有挑战性。最近,Transformer已应用于计算机视觉领域,并在HOI检测任务中受到关注。因此,本文提出了一种用于HOI检测的新型部分细化串联Transformer(PRTT)。与先前基于Transformer的HOI方法不同,PRTT利用多个解码器来拆分和处理HOI预测的丰富元素,并引入了一个新的部分状态特征提取(PSFE)模块来帮助改进最终的交互类别分类。我们采用了一种新颖的先验特征集成交叉注意力(PFIC)来利用PSFE模块获得的细粒度部分状态语义和外观特征输出指导查询。我们在两个公共数据集V-COCO和HICO-DET上验证了我们的方法。与最先进的模型相比,PRTT显著提高了检测人机交互的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbd0/11244048/ac675ad779ec/sensors-24-04278-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验