• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过第一人称视频的接触表示进行动作预测。

Forecasting Action Through Contact Representations From First Person Video.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6703-6714. doi: 10.1109/TPAMI.2021.3055233. Epub 2023 May 5.

DOI:10.1109/TPAMI.2021.3055233
PMID:33507864
Abstract

Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation. We annotate a subset of the EPIC Kitchens dataset to include time-to-contact between hands and objects, as well as segmentations of hands and objects. Using these annotations we train the Anticipation Module, a module producing Contact Anticipation Maps and Next Active Object Segmentations - novel low-level representations providing temporal and spatial characteristics of anticipated near future action. On top of the Anticipation Module we apply Egocentric Object Manipulation Graphs (Ego-OMG), a framework for action anticipation and prediction. Ego-OMG models longer term temporal semantic relations through the use of a graph modeling transitions between contact delineated action states. Use of the Anticipation Module within Ego-OMG produces state-of-the-art results, achieving 1st and 2 place on the unseen and seen test sets, respectively, of the EPIC Kitchens Action Anticipation Challenge, and achieving state-of-the-art results on the tasks of action anticipation and action prediction over EPIC Kitchens. We perform ablation studies over characteristics of the Anticipation Module to evaluate their utility.

摘要

人类的手部操作行为是根据手与物体接触的建立和断开而组织的,人类对动作的视觉理解依赖于对接触的预测,这一点已被认知科学的开创性工作所证明。受此启发,我们引入了以接触为中心的表示和模型,然后将其用于动作预测和预期。我们对 EPIC Kitchens 数据集的一个子集进行注释,包括手与物体之间的接触时间以及手和物体的分割。我们使用这些注释来训练“预期模块”,该模块生成“接触预期图”和“下一个活动对象分割”——这是提供预期未来动作的时间和空间特征的新型低水平表示。在“预期模块”之上,我们应用了“自我中心物体操作图”(Ego-OMG),这是一种用于动作预测和预期的框架。Ego-OMG 通过使用图来建模接触划定的动作状态之间的转换,从而对长期的时间语义关系进行建模。在 Ego-OMG 中使用“预期模块”可以产生最先进的结果,在 EPIC Kitchens 动作预期挑战的未见和已见测试集中分别获得第一名和第二名,并在 EPIC Kitchens 上的动作预期和动作预测任务中达到最先进的水平。我们对“预期模块”的特征进行了消融研究,以评估它们的实用性。

相似文献

1
Forecasting Action Through Contact Representations From First Person Video.通过第一人称视频的接触表示进行动作预测。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6703-6714. doi: 10.1109/TPAMI.2021.3055233. Epub 2023 May 5.
2
Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video.从第一人称视频中进行动作预测的滚动-展开 LSTM。
IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4021-4036. doi: 10.1109/TPAMI.2020.2992889. Epub 2021 Oct 1.
3
Learning to Anticipate Egocentric Actions by Imagination.通过想象学习预测自我中心动作。
IEEE Trans Image Process. 2021;30:1143-1152. doi: 10.1109/TIP.2020.3040521. Epub 2020 Dec 17.
4
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines.EPIC-KITCHENS 数据集:采集、挑战与基准。
IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4125-4141. doi: 10.1109/TPAMI.2020.2991965. Epub 2021 Oct 1.
5
Multi-Label Action Anticipation for Real-World Videos With Scene Understanding.基于场景理解的真实世界视频多标签动作预测
IEEE Trans Image Process. 2024;33:3242-3255. doi: 10.1109/TIP.2024.3391692. Epub 2024 May 9.
6
Action Anticipation Using Pairwise Human-Object Interactions and Transformers.利用成对的人与物体交互和Transformer进行动作预测
IEEE Trans Image Process. 2021;30:8116-8129. doi: 10.1109/TIP.2021.3113114. Epub 2021 Sep 27.
7
Learning to Recognize Actions on Objects in Egocentric Video With Attention Dictionaries.基于注意字典的自主体视频中物体动作识别学习。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6674-6687. doi: 10.1109/TPAMI.2021.3058649. Epub 2023 May 5.
8
Multi-Dataset, Multitask Learning of Egocentric Vision Tasks.多数据集、自定视觉任务的多任务学习。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6618-6630. doi: 10.1109/TPAMI.2021.3061479. Epub 2023 May 5.
9
Relational Reasoning Over Spatial-Temporal Graphs for Video Summarization.用于视频摘要的时空图关系推理
IEEE Trans Image Process. 2022;31:3017-3031. doi: 10.1109/TIP.2022.3163855. Epub 2022 Apr 11.
10
Adaptive Spatio-Temporal Graph Enhanced Vision-Language Representation for Video QA.用于视频问答的自适应时空图增强视觉语言表示
IEEE Trans Image Process. 2021;30:5477-5489. doi: 10.1109/TIP.2021.3076556. Epub 2021 Jun 11.

引用本文的文献

1
Editorial: Enhanced human modeling in robotics for socially-aware place navigation.社论:机器人技术中用于具有社会意识的场所导航的增强型人体建模
Front Robot AI. 2024 Mar 1;11:1348022. doi: 10.3389/frobt.2024.1348022. eCollection 2024.
2
3D network with channel excitation and knowledge distillation for action recognition.用于动作识别的具有通道激励和知识蒸馏的3D网络。
Front Neurorobot. 2023 Mar 23;17:1050167. doi: 10.3389/fnbot.2023.1050167. eCollection 2023.