Suppr超能文献

用于域自适应动作识别的以人为中心的Transformer

Human-Centric Transformer for Domain Adaptive Action Recognition.

作者信息

Lin Kun-Yu, Zhou Jiaming, Zheng Wei-Shi

出版信息

IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):679-696. doi: 10.1109/TPAMI.2024.3429387. Epub 2025 Jan 9.

Abstract

We study the domain adaptation task for action recognition, namely domain adaptive action recognition, which aims to effectively transfer action recognition power from a label-sufficient source domain to a label-free target domain. Since actions are performed by humans, it is crucial to exploit human cues in videos when recognizing actions across domains. However, existing methods are prone to losing human cues but prefer to exploit the correlation between non-human contexts and associated actions for recognition, and the contexts of interest agnostic to actions would reduce recognition performance in the target domain. To overcome this problem, we focus on uncovering human-centric action cues for domain adaptive action recognition, and our conception is to investigate two aspects of human-centric action cues, namely human cues and human-context interaction cues. Accordingly, our proposed Human-Centric Transformer (HCTransformer) develops a decoupled human-centric learning paradigm to explicitly concentrate on human-centric action cues in domain-variant video feature learning. Our HCTransformer first conducts human-aware temporal modeling by a human encoder, aiming to avoid a loss of human cues during domain-invariant video feature learning. Then, by a Transformer-like architecture, HCTransformer exploits domain-invariant and action-correlated contexts by a context encoder, and further models domain-invariant interaction between humans and action-correlated contexts. We conduct extensive experiments on three benchmarks, namely UCF-HMDB, Kinetics-NecDrone and EPIC-Kitchens-UDA, and the state-of-the-art performance demonstrates the effectiveness of our proposed HCTransformer.

摘要

我们研究用于动作识别的域适应任务,即域自适应动作识别,其旨在将动作识别能力从标签充足的源域有效地转移到无标签的目标域。由于动作是由人类执行的,因此在跨域识别动作时利用视频中的人类线索至关重要。然而,现有方法容易丢失人类线索,而是倾向于利用非人类上下文与相关动作之间的相关性进行识别,并且与动作无关的感兴趣上下文会降低目标域中的识别性能。为了克服这个问题,我们专注于揭示用于域自适应动作识别的以人类为中心的动作线索,我们的概念是研究以人类为中心的动作线索的两个方面,即人类线索和人类-上下文交互线索。相应地,我们提出的以人类为中心的Transformer(HCTransformer)开发了一种解耦的以人类为中心的学习范式,以在域可变视频特征学习中明确关注以人类为中心的动作线索。我们的HCTransformer首先通过人类编码器进行人类感知的时间建模,旨在避免在域不变视频特征学习期间丢失人类线索。然后,通过类似Transformer的架构,HCTransformer通过上下文编码器利用域不变且与动作相关的上下文,并进一步对人类与动作相关上下文之间的域不变交互进行建模。我们在三个基准上进行了广泛的实验,即UCF-HMDB、Kinetics-NecDrone和EPIC-Kitchens-UDA,并且最先进的性能证明了我们提出的HCTransformer的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验