ERNet：一种高效可靠的人-物交互检测网络。

ERNet: An Efficient and Reliable Human-Object Interaction Detection Network.

作者信息

Lim JunYi, Baskaran Vishnu Monn, Lim Joanne Mun-Yee, Wong KokSheik, See John, Tistarelli Massimo

出版信息

IEEE Trans Image Process. 2023;32:964-979. doi: 10.1109/TIP.2022.3231528.

DOI:10.1109/TIP.2022.3231528

PMID:37022006

Abstract

Human-Object Interaction (HOI) detection recognizes how persons interact with objects, which is advantageous in autonomous systems such as self-driving vehicles and collaborative robots. However, current HOI detectors are often plagued by model inefficiency and unreliability when making a prediction, which consequently limits its potential for real-world scenarios. In this paper, we address these challenges by proposing ERNet, an end-to-end trainable convolutional-transformer network for HOI detection. The proposed model employs an efficient multi-scale deformable attention to effectively capture vital HOI features. We also put forward a novel detection attention module to adaptively generate semantically rich instance and interaction tokens. These tokens undergo pre-emptive detections to produce initial region and vector proposals that also serve as queries which enhances the feature refinement process in the transformer decoders. Several impactful enhancements are also applied to improve the HOI representation learning. Additionally, we utilize a predictive uncertainty estimation framework in the instance and interaction classification heads to quantify the uncertainty behind each prediction. By doing so, we can accurately and reliably predict HOIs even under challenging scenarios. Experiment results on the HICO-Det, V-COCO, and HOI-A datasets demonstrate that the proposed model achieves state-of-the-art performance in detection accuracy and training efficiency. Codes are publicly available at https://github.com/Monash-CyPhi-AI-Research-Lab/ernet.

摘要

人机交互（HOI）检测旨在识别人员与物体之间的交互方式，这在诸如自动驾驶车辆和协作机器人等自主系统中具有优势。然而，当前的HOI检测器在进行预测时常常受到模型效率低下和不可靠性的困扰，这限制了其在现实场景中的应用潜力。在本文中，我们提出了ERNet来应对这些挑战，ERNet是一种用于HOI检测的端到端可训练卷积-Transformer网络。所提出的模型采用了高效的多尺度可变形注意力机制，以有效捕捉重要的HOI特征。我们还提出了一种新颖的检测注意力模块，用于自适应地生成语义丰富的实例和交互令牌。这些令牌经过抢先检测，以生成初始区域和向量提议，这些提议也用作查询，从而增强了Transformer解码器中的特征细化过程。我们还应用了一些有影响力的增强措施来改进HOI表示学习。此外，我们在实例和交互分类头中使用了预测不确定性估计框架，以量化每个预测背后的不确定性。通过这样做，即使在具有挑战性的场景下，我们也能准确可靠地预测HOI。在HICO-Det、V-COCO和HOI-A数据集上的实验结果表明，所提出的模型在检测精度和训练效率方面达到了当前的最优性能。代码可在https://github.com/Monash-CyPhi-AI-Research-Lab/ernet上公开获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ERNet：一种高效可靠的人-物交互检测网络。

ERNet: An Efficient and Reliable Human-Object Interaction Detection Network.

作者信息

出版信息

相似文献

引用本文的文献

ERNet：一种高效可靠的人-物交互检测网络。

ERNet: An Efficient and Reliable Human-Object Interaction Detection Network.

作者信息

出版信息

相似文献

引用本文的文献