• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于域自适应动作识别的以人为中心的Transformer

Human-Centric Transformer for Domain Adaptive Action Recognition.

作者信息

Lin Kun-Yu, Zhou Jiaming, Zheng Wei-Shi

出版信息

IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):679-696. doi: 10.1109/TPAMI.2024.3429387. Epub 2025 Jan 9.

DOI:10.1109/TPAMI.2024.3429387
PMID:39012755
Abstract

We study the domain adaptation task for action recognition, namely domain adaptive action recognition, which aims to effectively transfer action recognition power from a label-sufficient source domain to a label-free target domain. Since actions are performed by humans, it is crucial to exploit human cues in videos when recognizing actions across domains. However, existing methods are prone to losing human cues but prefer to exploit the correlation between non-human contexts and associated actions for recognition, and the contexts of interest agnostic to actions would reduce recognition performance in the target domain. To overcome this problem, we focus on uncovering human-centric action cues for domain adaptive action recognition, and our conception is to investigate two aspects of human-centric action cues, namely human cues and human-context interaction cues. Accordingly, our proposed Human-Centric Transformer (HCTransformer) develops a decoupled human-centric learning paradigm to explicitly concentrate on human-centric action cues in domain-variant video feature learning. Our HCTransformer first conducts human-aware temporal modeling by a human encoder, aiming to avoid a loss of human cues during domain-invariant video feature learning. Then, by a Transformer-like architecture, HCTransformer exploits domain-invariant and action-correlated contexts by a context encoder, and further models domain-invariant interaction between humans and action-correlated contexts. We conduct extensive experiments on three benchmarks, namely UCF-HMDB, Kinetics-NecDrone and EPIC-Kitchens-UDA, and the state-of-the-art performance demonstrates the effectiveness of our proposed HCTransformer.

摘要

我们研究用于动作识别的域适应任务,即域自适应动作识别,其旨在将动作识别能力从标签充足的源域有效地转移到无标签的目标域。由于动作是由人类执行的,因此在跨域识别动作时利用视频中的人类线索至关重要。然而,现有方法容易丢失人类线索,而是倾向于利用非人类上下文与相关动作之间的相关性进行识别,并且与动作无关的感兴趣上下文会降低目标域中的识别性能。为了克服这个问题,我们专注于揭示用于域自适应动作识别的以人类为中心的动作线索,我们的概念是研究以人类为中心的动作线索的两个方面,即人类线索和人类-上下文交互线索。相应地,我们提出的以人类为中心的Transformer(HCTransformer)开发了一种解耦的以人类为中心的学习范式,以在域可变视频特征学习中明确关注以人类为中心的动作线索。我们的HCTransformer首先通过人类编码器进行人类感知的时间建模,旨在避免在域不变视频特征学习期间丢失人类线索。然后,通过类似Transformer的架构,HCTransformer通过上下文编码器利用域不变且与动作相关的上下文,并进一步对人类与动作相关上下文之间的域不变交互进行建模。我们在三个基准上进行了广泛的实验,即UCF-HMDB、Kinetics-NecDrone和EPIC-Kitchens-UDA,并且最先进的性能证明了我们提出的HCTransformer的有效性。

相似文献

1
Human-Centric Transformer for Domain Adaptive Action Recognition.用于域自适应动作识别的以人为中心的Transformer
IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):679-696. doi: 10.1109/TPAMI.2024.3429387. Epub 2025 Jan 9.
2
Cross-domain human action recognition.跨域人类动作识别
IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):298-307. doi: 10.1109/TSMCB.2011.2166761. Epub 2011 Sep 26.
3
Close Human Interaction Recognition Using Patch-Aware Models.基于补丁感知模型的近距人类交互识别
IEEE Trans Image Process. 2016 Jan;25(1):167-78. doi: 10.1109/TIP.2015.2498410. Epub 2015 Nov 5.
4
Learning a Deep Model for Human Action Recognition from Novel Viewpoints.从新视角学习人类动作识别的深度模型。
IEEE Trans Pattern Anal Mach Intell. 2018 Mar;40(3):667-681. doi: 10.1109/TPAMI.2017.2691768. Epub 2017 Apr 6.
5
Deeply Learned View-Invariant Features for Cross-View Action Recognition.深度学习的视图不变特征用于跨视图动作识别。
IEEE Trans Image Process. 2017 Jun;26(6):3028-3037. doi: 10.1109/TIP.2017.2696786. Epub 2017 Apr 24.
6
Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach.学习时空表示进行动作识别:一种遗传编程方法。
IEEE Trans Cybern. 2016 Jan;46(1):158-70. doi: 10.1109/TCYB.2015.2399172. Epub 2015 Feb 13.
7
Robust video content analysis schemes for human action recognition.用于人体动作识别的稳健视频内容分析方案。
Sci Prog. 2021 Apr-Jun;104(2):368504211005480. doi: 10.1177/00368504211005480.
8
Desktop Action Recognition From First-Person Point-of-View.基于第一人称视角的桌面行为识别。
IEEE Trans Cybern. 2019 May;49(5):1616-1628. doi: 10.1109/TCYB.2018.2806381. Epub 2018 Feb 27.
9
Explicit modeling of human-object interactions in realistic videos.真实视频中人类-物体交互的显式建模。
IEEE Trans Pattern Anal Mach Intell. 2013 Apr;35(4):835-48. doi: 10.1109/TPAMI.2012.175.
10
Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis.基于时空方向分析的动作定位与识别。
IEEE Trans Pattern Anal Mach Intell. 2013 Mar;35(3):527-40. doi: 10.1109/TPAMI.2012.141.

引用本文的文献

1
Lightweight and efficient skeleton-based sports activity recognition with ASTM-Net.基于ASTM-Net的轻量级高效骨架运动活动识别
PLoS One. 2025 Jul 8;20(7):e0324605. doi: 10.1371/journal.pone.0324605. eCollection 2025.
2
An action decoding framework combined with deep neural network for predicting the semantics of human actions in videos from evoked brain activities.一种结合深度神经网络的动作解码框架,用于从诱发脑电活动预测视频中人类动作的语义。
Front Neuroinform. 2025 Feb 19;19:1526259. doi: 10.3389/fninf.2025.1526259. eCollection 2025.