通过自动关系建模实现自我中心动作识别

Li Haoxin, Zheng Wei-Shi, Zhang Jianguo, Hu Haifeng, Lu Jiwen, Lai Jian-Huang

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):489-507. doi: 10.1109/TPAMI.2022.3148790. Epub 2022 Dec 5.

Egocentric videos, which record the daily activities of individuals from a first-person point of view, have attracted increasing attention during recent years because of their growing use in many popular applications, including life logging, health monitoring and virtual reality. As a fundamental problem in egocentric vision, one of the tasks of egocentric action recognition aims to recognize the actions of the camera wearers from egocentric videos. In egocentric action recognition, relation modeling is important, because the interactions between the camera wearer and the recorded persons or objects form complex relations in egocentric videos. However, only a few of existing methods model the relations between the camera wearer and the interacting persons for egocentric action recognition, and moreover they require prior knowledge or auxiliary data to localize the interacting persons. In this work, we consider modeling the relations in a weakly supervised manner, i.e., without using annotations or prior knowledge about the interacting persons or objects, for egocentric action recognition. We form a weakly supervised framework by unifying automatic interactor localization and explicit relation modeling for the purpose of automatic relation modeling. First, we learn to automatically localize the interactors, i.e., the body parts of the camera wearer and the persons or objects that the camera wearer interacts with, by learning a series of keypoints directly from video data to localize the action-relevant regions with only action labels and some constraints on these keypoints. Second, more importantly, to explicitly model the relations between the interactors, we develop an ego-relational LSTM (long short-term memory) network with several candidate connections to model the complex relations in egocentric videos, such as the temporal, interactive, and contextual relations. In particular, to reduce human efforts and manual interventions needed to construct an optimal ego-relational LSTM structure, we search for the optimal connections by employing a differentiable network architecture search mechanism, which automatically constructs the ego-relational LSTM network to explicitly model different relations for egocentric action recognition. We conduct extensive experiments on egocentric video datasets to illustrate the effectiveness of our method.

以自我为中心的视频从第一人称视角记录个人的日常活动，近年来因其在许多流行应用中的广泛使用而受到越来越多的关注，这些应用包括生活记录、健康监测和虚拟现实。作为以自我为中心视觉中的一个基本问题，以自我为中心的动作识别任务之一旨在从以自我为中心的视频中识别佩戴相机者的动作。在以自我为中心的动作识别中，关系建模很重要，因为在以自我为中心的视频中，相机佩戴者与被记录的人或物体之间的交互形成了复杂的关系。然而，现有的方法中只有少数用于以自我为中心的动作识别来对相机佩戴者与交互对象之间的关系进行建模，而且它们需要先验知识或辅助数据来定位交互对象。在这项工作中，我们考虑以弱监督的方式对关系进行建模，即不使用关于交互对象或物体的注释或先验知识，用于以自我为中心的动作识别。为了进行自动关系建模，我们通过统一自动交互对象定位和显式关系建模来形成一个弱监督框架。首先，我们通过直接从视频数据中学习一系列关键点来自动定位交互对象，即相机佩戴者的身体部位以及相机佩戴者与之交互的人或物体，以便仅使用动作标签和对这些关键点的一些约束来定位与动作相关的区域。其次，更重要的是，为了显式地对交互对象之间的关系进行建模，我们开发了一个具有多个候选连接的自我关系长短期记忆（LSTM）网络，以对以自我为中心的视频中的复杂关系进行建模，例如时间、交互和上下文关系。特别是，为了减少构建最优自我关系LSTM结构所需的人力和人工干预，我们采用可微网络架构搜索机制来搜索最优连接，该机制会自动构建自我关系LSTM网络以显式地对以自我为中心的动作识别中的不同关系进行建模。我们在以自我为中心的视频数据集上进行了广泛的实验，以说明我们方法的有效性。

相似文献

Egocentric Action Recognition by Automatic Relation Modeling.

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):489-507. doi: 10.1109/TPAMI.2022.3148790. Epub 2022 Dec 5.

Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video.

IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4021-4036. doi: 10.1109/TPAMI.2020.2992889. Epub 2021 Oct 1.

Learning to Recognize Actions on Objects in Egocentric Video With Attention Dictionaries.

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6674-6687. doi: 10.1109/TPAMI.2021.3058649. Epub 2023 May 5.

Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos.

IEEE Trans Image Process. 2021;30:4330-4340. doi: 10.1109/TIP.2021.3070732. Epub 2021 Apr 16.

DANet: Semi-supervised differentiated auxiliaries guided network for video action recognition.

Neural Netw. 2023 Jan;158:121-131. doi: 10.1016/j.neunet.2022.11.009. Epub 2022 Nov 17.

Egocentric Temporal Action Proposals.

IEEE Trans Image Process. 2018 Feb;27(2):764-777. doi: 10.1109/TIP.2017.2772904.

Deep Attention Network for Egocentric Action Recognition.

IEEE Trans Image Process. 2019 Aug;28(8):3703-3713. doi: 10.1109/TIP.2019.2901707. Epub 2019 Feb 26.

Delving into Egocentric Actions.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015 Jun;2015:287-295. doi: 10.1109/CVPR.2015.7298625.

Analysis of the Hands in Egocentric Vision: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6846-6866. doi: 10.1109/TPAMI.2020.2986648. Epub 2023 May 5.

A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization.

Sensors (Basel). 2024 Apr 12;24(8):2491. doi: 10.3390/s24082491.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Egocentric Action Recognition by Automatic Relation Modeling.

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):489-507. doi: 10.1109/TPAMI.2022.3148790. Epub 2022 Dec 5.

Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video.

IEEE Trans Pattern Anal Mach Intell. 2021 Nov;43(11):4021-4036. doi: 10.1109/TPAMI.2020.2992889. Epub 2021 Oct 1.

Learning to Recognize Actions on Objects in Egocentric Video With Attention Dictionaries.

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6674-6687. doi: 10.1109/TPAMI.2021.3058649. Epub 2023 May 5.

Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos.

IEEE Trans Image Process. 2021;30:4330-4340. doi: 10.1109/TIP.2021.3070732. Epub 2021 Apr 16.

DANet: Semi-supervised differentiated auxiliaries guided network for video action recognition.

Neural Netw. 2023 Jan;158:121-131. doi: 10.1016/j.neunet.2022.11.009. Epub 2022 Nov 17.

Egocentric Temporal Action Proposals.

IEEE Trans Image Process. 2018 Feb;27(2):764-777. doi: 10.1109/TIP.2017.2772904.

Deep Attention Network for Egocentric Action Recognition.

IEEE Trans Image Process. 2019 Aug;28(8):3703-3713. doi: 10.1109/TIP.2019.2901707. Epub 2019 Feb 26.

Delving into Egocentric Actions.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015 Jun;2015:287-295. doi: 10.1109/CVPR.2015.7298625.

Analysis of the Hands in Egocentric Vision: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6846-6866. doi: 10.1109/TPAMI.2020.2986648. Epub 2023 May 5.

A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization.

Sensors (Basel). 2024 Apr 12;24(8):2491. doi: 10.3390/s24082491.

Egocentric Action Recognition by Automatic Relation Modeling.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献