Suppr超能文献

通过自动关系建模实现自我中心动作识别

Egocentric Action Recognition by Automatic Relation Modeling.

作者信息

Li Haoxin, Zheng Wei-Shi, Zhang Jianguo, Hu Haifeng, Lu Jiwen, Lai Jian-Huang

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):489-507. doi: 10.1109/TPAMI.2022.3148790. Epub 2022 Dec 5.

Abstract

Egocentric videos, which record the daily activities of individuals from a first-person point of view, have attracted increasing attention during recent years because of their growing use in many popular applications, including life logging, health monitoring and virtual reality. As a fundamental problem in egocentric vision, one of the tasks of egocentric action recognition aims to recognize the actions of the camera wearers from egocentric videos. In egocentric action recognition, relation modeling is important, because the interactions between the camera wearer and the recorded persons or objects form complex relations in egocentric videos. However, only a few of existing methods model the relations between the camera wearer and the interacting persons for egocentric action recognition, and moreover they require prior knowledge or auxiliary data to localize the interacting persons. In this work, we consider modeling the relations in a weakly supervised manner, i.e., without using annotations or prior knowledge about the interacting persons or objects, for egocentric action recognition. We form a weakly supervised framework by unifying automatic interactor localization and explicit relation modeling for the purpose of automatic relation modeling. First, we learn to automatically localize the interactors, i.e., the body parts of the camera wearer and the persons or objects that the camera wearer interacts with, by learning a series of keypoints directly from video data to localize the action-relevant regions with only action labels and some constraints on these keypoints. Second, more importantly, to explicitly model the relations between the interactors, we develop an ego-relational LSTM (long short-term memory) network with several candidate connections to model the complex relations in egocentric videos, such as the temporal, interactive, and contextual relations. In particular, to reduce human efforts and manual interventions needed to construct an optimal ego-relational LSTM structure, we search for the optimal connections by employing a differentiable network architecture search mechanism, which automatically constructs the ego-relational LSTM network to explicitly model different relations for egocentric action recognition. We conduct extensive experiments on egocentric video datasets to illustrate the effectiveness of our method.

摘要

以自我为中心的视频从第一人称视角记录个人的日常活动,近年来因其在许多流行应用中的广泛使用而受到越来越多的关注,这些应用包括生活记录、健康监测和虚拟现实。作为以自我为中心视觉中的一个基本问题,以自我为中心的动作识别任务之一旨在从以自我为中心的视频中识别佩戴相机者的动作。在以自我为中心的动作识别中,关系建模很重要,因为在以自我为中心的视频中,相机佩戴者与被记录的人或物体之间的交互形成了复杂的关系。然而,现有的方法中只有少数用于以自我为中心的动作识别来对相机佩戴者与交互对象之间的关系进行建模,而且它们需要先验知识或辅助数据来定位交互对象。在这项工作中,我们考虑以弱监督的方式对关系进行建模,即不使用关于交互对象或物体的注释或先验知识,用于以自我为中心的动作识别。为了进行自动关系建模,我们通过统一自动交互对象定位和显式关系建模来形成一个弱监督框架。首先,我们通过直接从视频数据中学习一系列关键点来自动定位交互对象,即相机佩戴者的身体部位以及相机佩戴者与之交互的人或物体,以便仅使用动作标签和对这些关键点的一些约束来定位与动作相关的区域。其次,更重要的是,为了显式地对交互对象之间的关系进行建模,我们开发了一个具有多个候选连接的自我关系长短期记忆(LSTM)网络,以对以自我为中心的视频中的复杂关系进行建模,例如时间、交互和上下文关系。特别是,为了减少构建最优自我关系LSTM结构所需的人力和人工干预,我们采用可微网络架构搜索机制来搜索最优连接,该机制会自动构建自我关系LSTM网络以显式地对以自我为中心的动作识别中的不同关系进行建模。我们在以自我为中心的视频数据集上进行了广泛的实验,以说明我们方法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验