通过第一人称视频的接触表示进行动作预测。

Forecasting Action Through Contact Representations From First Person Video.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6703-6714. doi: 10.1109/TPAMI.2021.3055233. Epub 2023 May 5.

DOI:10.1109/TPAMI.2021.3055233

Abstract

Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation. We annotate a subset of the EPIC Kitchens dataset to include time-to-contact between hands and objects, as well as segmentations of hands and objects. Using these annotations we train the Anticipation Module, a module producing Contact Anticipation Maps and Next Active Object Segmentations - novel low-level representations providing temporal and spatial characteristics of anticipated near future action. On top of the Anticipation Module we apply Egocentric Object Manipulation Graphs (Ego-OMG), a framework for action anticipation and prediction. Ego-OMG models longer term temporal semantic relations through the use of a graph modeling transitions between contact delineated action states. Use of the Anticipation Module within Ego-OMG produces state-of-the-art results, achieving 1st and 2 place on the unseen and seen test sets, respectively, of the EPIC Kitchens Action Anticipation Challenge, and achieving state-of-the-art results on the tasks of action anticipation and action prediction over EPIC Kitchens. We perform ablation studies over characteristics of the Anticipation Module to evaluate their utility.

摘要

人类的手部操作行为是根据手与物体接触的建立和断开而组织的，人类对动作的视觉理解依赖于对接触的预测，这一点已被认知科学的开创性工作所证明。受此启发，我们引入了以接触为中心的表示和模型，然后将其用于动作预测和预期。我们对 EPIC Kitchens 数据集的一个子集进行注释，包括手与物体之间的接触时间以及手和物体的分割。我们使用这些注释来训练“预期模块”，该模块生成“接触预期图”和“下一个活动对象分割”——这是提供预期未来动作的时间和空间特征的新型低水平表示。在“预期模块”之上，我们应用了“自我中心物体操作图”（Ego-OMG），这是一种用于动作预测和预期的框架。Ego-OMG 通过使用图来建模接触划定的动作状态之间的转换，从而对长期的时间语义关系进行建模。在 Ego-OMG 中使用“预期模块”可以产生最先进的结果，在 EPIC Kitchens 动作预期挑战的未见和已见测试集中分别获得第一名和第二名，并在 EPIC Kitchens 上的动作预期和动作预测任务中达到最先进的水平。我们对“预期模块”的特征进行了消融研究，以评估它们的实用性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过第一人称视频的接触表示进行动作预测。

Forecasting Action Through Contact Representations From First Person Video.

出版信息

相似文献

引用本文的文献

通过第一人称视频的接触表示进行动作预测。

Forecasting Action Through Contact Representations From First Person Video.

出版信息

相似文献

引用本文的文献