Liang Manshu, Wu Wenbin, Chen Zhuolei, Han Tengfei, Zheng Yuan
Electric Power Science Research Institute, State Grid Fujian Electric Power Co. Ltd., Fujian, China.
School of Computer Science, Civil Aviation Flight University of China, Deyang, China.
PLoS One. 2025 Jul 24;20(7):e0327302. doi: 10.1371/journal.pone.0327302. eCollection 2025.
Reasoning temporal relations among action-related objects plays an important role in action recognition. However, previous approaches only focus the reasoning on high-level semantics and inevitably involve the background in reasoning. In this work, we propose to formulate the temporal relational reasoning in an action-centric and hierarchical style, with a novel Action-centric Temporal-relational Reasoning (ATR) block. Specifically, ATR comprises two components: an Action-related Region Locator (ARL) to locate the action-related regions via estimating the actionness, and an Efficient Action-centric Reasoner (EAR) to efficiently reason the temporal relations between the located regions so as to realize the action-centric reasoning. Thanks to its flexible and efficient designs, our ATR can be directly integrated into existing action recognition models at different depths, empowering the hierarchical reasoning on the action-centric temporal relations at the cost of minor computational overhead. We extensively evaluate our ATR block on three action recognition benchmarks, Something-Something V1, V2, and Kinetics, with the backbones of TSN, TSM, and SlowOnly. The consistent and notable improvements over the strong baselines sufficiently corroborate the effectiveness of ATR, along with the action-centric and hierarchical formulation for temporal relational reasoning. Our proposed approach provides potential practical significance for real-world scenarios.
推理与动作相关对象之间的时间关系在动作识别中起着重要作用。然而,先前的方法仅将推理聚焦于高级语义,并且不可避免地在推理中涉及背景信息。在这项工作中,我们提出以一种以动作为中心的分层方式来构建时间关系推理,采用一种新颖的以动作为中心的时间关系推理(ATR)模块。具体而言,ATR由两个组件组成:一个动作相关区域定位器(ARL),通过估计动作性来定位动作相关区域;以及一个高效的以动作为中心的推理器(EAR),用于高效地推理所定位区域之间的时间关系,从而实现以动作为中心的推理。由于其灵活且高效的设计,我们的ATR可以直接集成到不同深度的现有动作识别模型中,以较小的计算开销为代价,实现以动作为中心的时间关系的分层推理。我们在三个动作识别基准数据集Something-Something V1、V2和Kinetics上,使用TSN、TSM和SlowOnly作为骨干网络,对我们的ATR模块进行了广泛评估。相对于强大的基线模型,一致且显著的性能提升充分证实了ATR的有效性,以及以动作为中心的时间关系推理的分层构建方式。我们提出的方法为现实世界场景提供了潜在的实际意义。