• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过自动关系建模实现自我中心动作识别

Egocentric Action Recognition by Automatic Relation Modeling.

作者信息

Li Haoxin, Zheng Wei-Shi, Zhang Jianguo, Hu Haifeng, Lu Jiwen, Lai Jian-Huang

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):489-507. doi: 10.1109/TPAMI.2022.3148790. Epub 2022 Dec 5.

DOI:10.1109/TPAMI.2022.3148790
PMID:35130146
Abstract

Egocentric videos, which record the daily activities of individuals from a first-person point of view, have attracted increasing attention during recent years because of their growing use in many popular applications, including life logging, health monitoring and virtual reality. As a fundamental problem in egocentric vision, one of the tasks of egocentric action recognition aims to recognize the actions of the camera wearers from egocentric videos. In egocentric action recognition, relation modeling is important, because the interactions between the camera wearer and the recorded persons or objects form complex relations in egocentric videos. However, only a few of existing methods model the relations between the camera wearer and the interacting persons for egocentric action recognition, and moreover they require prior knowledge or auxiliary data to localize the interacting persons. In this work, we consider modeling the relations in a weakly supervised manner, i.e., without using annotations or prior knowledge about the interacting persons or objects, for egocentric action recognition. We form a weakly supervised framework by unifying automatic interactor localization and explicit relation modeling for the purpose of automatic relation modeling. First, we learn to automatically localize the interactors, i.e., the body parts of the camera wearer and the persons or objects that the camera wearer interacts with, by learning a series of keypoints directly from video data to localize the action-relevant regions with only action labels and some constraints on these keypoints. Second, more importantly, to explicitly model the relations between the interactors, we develop an ego-relational LSTM (long short-term memory) network with several candidate connections to model the complex relations in egocentric videos, such as the temporal, interactive, and contextual relations. In particular, to reduce human efforts and manual interventions needed to construct an optimal ego-relational LSTM structure, we search for the optimal connections by employing a differentiable network architecture search mechanism, which automatically constructs the ego-relational LSTM network to explicitly model different relations for egocentric action recognition. We conduct extensive experiments on egocentric video datasets to illustrate the effectiveness of our method.

摘要

以自我为中心的视频从第一人称视角记录个人的日常活动,近年来因其在许多流行应用中的广泛使用而受到越来越多的关注,这些应用包括生活记录、健康监测和虚拟现实。作为以自我为中心视觉中的一个基本问题,以自我为中心的动作识别任务之一旨在从以自我为中心的视频中识别佩戴相机者的动作。在以自我为中心的动作识别中,关系建模很重要,因为在以自我为中心的视频中,相机佩戴者与被记录的人或物体之间的交互形成了复杂的关系。然而,现有的方法中只有少数用于以自我为中心的动作识别来对相机佩戴者与交互对象之间的关系进行建模,而且它们需要先验知识或辅助数据来定位交互对象。在这项工作中,我们考虑以弱监督的方式对关系进行建模,即不使用关于交互对象或物体的注释或先验知识,用于以自我为中心的动作识别。为了进行自动关系建模,我们通过统一自动交互对象定位和显式关系建模来形成一个弱监督框架。首先,我们通过直接从视频数据中学习一系列关键点来自动定位交互对象,即相机佩戴者的身体部位以及相机佩戴者与之交互的人或物体,以便仅使用动作标签和对这些关键点的一些约束来定位与动作相关的区域。其次,更重要的是,为了显式地对交互对象之间的关系进行建模,我们开发了一个具有多个候选连接的自我关系长短期记忆(LSTM)网络,以对以自我为中心的视频中的复杂关系进行建模,例如时间、交互和上下文关系。特别是,为了减少构建最优自我关系LSTM结构所需的人力和人工干预,我们采用可微网络架构搜索机制来搜索最优连接,该机制会自动构建自我关系LSTM网络以显式地对以自我为中心的动作识别中的不同关系进行建模。我们在以自我为中心的视频数据集上进行了广泛的实验,以说明我们方法的有效性。

相似文献

1
Egocentric Action Recognition by Automatic Relation Modeling.通过自动关系建模实现自我中心动作识别
IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):489-507. doi: 10.1109/TPAMI.2022.3148790. Epub 2022 Dec 5.
2
Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video.从第一人称视频中进行动作预测的滚动-展开 LSTM。
IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4021-4036. doi: 10.1109/TPAMI.2020.2992889. Epub 2021 Oct 1.
3
Learning to Recognize Actions on Objects in Egocentric Video With Attention Dictionaries.基于注意字典的自主体视频中物体动作识别学习。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6674-6687. doi: 10.1109/TPAMI.2021.3058649. Epub 2023 May 5.
4
Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos.在自我中心视频中共同识别、定位和总结动作
IEEE Trans Image Process. 2021;30:4330-4340. doi: 10.1109/TIP.2021.3070732. Epub 2021 Apr 16.
5
DANet: Semi-supervised differentiated auxiliaries guided network for video action recognition.DANet:用于视频动作识别的半监督差异化辅助引导网络。
Neural Netw. 2023 Jan;158:121-131. doi: 10.1016/j.neunet.2022.11.009. Epub 2022 Nov 17.
6
Egocentric Temporal Action Proposals.自我中心时间动作提案。
IEEE Trans Image Process. 2018 Feb;27(2):764-777. doi: 10.1109/TIP.2017.2772904.
7
Deep Attention Network for Egocentric Action Recognition.基于深度注意力网络的自我中心动作识别。
IEEE Trans Image Process. 2019 Aug;28(8):3703-3713. doi: 10.1109/TIP.2019.2901707. Epub 2019 Feb 26.
8
Delving into Egocentric Actions.深入探究以自我为中心的行为。
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015 Jun;2015:287-295. doi: 10.1109/CVPR.2015.7298625.
9
Analysis of the Hands in Egocentric Vision: A Survey.自我中心视觉中的手分析:调查。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6846-6866. doi: 10.1109/TPAMI.2020.2986648. Epub 2023 May 5.
10
A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization.一种面向视频领域泛化的多模态自我中心活动识别方法。
Sensors (Basel). 2024 Apr 12;24(8):2491. doi: 10.3390/s24082491.