• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

模态补偿网络:用于动作识别的跨模态自适应

Modality Compensation Network: Cross-Modal Adaptation for Action Recognition.

作者信息

Song Sijie, Liu Jiaying, Li Yanghao, Guo Zongming

出版信息

IEEE Trans Image Process. 2020 Jan 23. doi: 10.1109/TIP.2020.2967577.

DOI:10.1109/TIP.2020.2967577
PMID:31995485
Abstract

With the prevalence of RGB-D cameras, multimodal video data have become more available for human action recognition. One main challenge for this task lies in how to effectively leverage their complementary information. In this work, we propose a Modality Compensation Network (MCN) to explore the relationships of different modalities, and boost the representations for human action recognition. We regard RGB/ optical flow videos as source modalities, skeletons as auxiliary modality. Our goal is to extract more discriminative features from source modalities, with the help of auxiliary modality. Built on deep Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) networks, our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning, that the network learns to compensate for the loss of skeletons at test time and even at training time. We explore multiple adaptation schemes to narrow the distance between source and auxiliary modal distributions from different levels, according to the alignment of source and auxiliary data in training. In addition, skeletons are only required in the training phase. Our model is able to improve the recognition performance with source data when testing. Experimental results reveal that MCN outperforms stateof- the-art approaches on four widely-used action recognition benchmarks.

摘要

随着RGB-D相机的普及,多模态视频数据在人类动作识别中变得更加可用。这项任务的一个主要挑战在于如何有效地利用它们的互补信息。在这项工作中,我们提出了一种模态补偿网络(MCN)来探索不同模态之间的关系,并增强人类动作识别的表征。我们将RGB/光流视频视为源模态,骨架视为辅助模态。我们的目标是借助辅助模态从源模态中提取更具判别力的特征。基于深度卷积神经网络(CNN)和长短期记忆(LSTM)网络构建,我们的模型通过一个模态自适应模块连接来自源模态和辅助模态的数据,以实现自适应表征学习,即网络学会在测试时甚至在训练时补偿骨架的缺失。根据训练中源数据和辅助数据的对齐情况,我们探索了多种自适应方案,从不同层面缩小源模态和辅助模态分布之间的距离。此外,仅在训练阶段需要骨架数据。我们的模型在测试时能够利用源数据提高识别性能。实验结果表明,在四个广泛使用的动作识别基准上,MCN优于当前的先进方法。

相似文献

1
Modality Compensation Network: Cross-Modal Adaptation for Action Recognition.模态补偿网络:用于动作识别的跨模态自适应
IEEE Trans Image Process. 2020 Jan 23. doi: 10.1109/TIP.2020.2967577.
2
View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition.用于基于骨架的高性能人体动作识别的视图自适应神经网络。
IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):1963-1978. doi: 10.1109/TPAMI.2019.2896631. Epub 2019 Jan 31.
3
Learning with Privileged Information via Adversarial Discriminative Modality Distillation.通过对抗性判别模态蒸馏进行带特权信息的学习。
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2581-2593. doi: 10.1109/TPAMI.2019.2929038. Epub 2019 Jul 16.
4
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos.MMNet:一种基于模型的 RGB-D 视频人体动作识别多模态网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3522-3538. doi: 10.1109/TPAMI.2022.3177813. Epub 2023 Feb 3.
5
Discriminative Relational Representation Learning for RGB-D Action Recognition.用于RGB-D动作识别的判别关系表示学习
IEEE Trans Image Process. 2016 Jun;25(6):2856-2865. doi: 10.1109/TIP.2016.2556940. Epub 2016 Apr 20.
6
Deep Image-to-Video Adaptation and Fusion Networks for Action Recognition.用于动作识别的深度图像到视频自适应与融合网络
IEEE Trans Image Process. 2019 Dec 11. doi: 10.1109/TIP.2019.2957930.
7
Cross-Modal Object Tracking via Modality-Aware Fusion Network and a Large-Scale Dataset.通过模态感知融合网络和大规模数据集实现跨模态目标跟踪
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6981-6994. doi: 10.1109/TNNLS.2024.3406189. Epub 2025 Apr 8.
8
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
9
Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection.用于 RGB-D 显著目标检测的判别式跨模态迁移学习和密集跨层反馈融合。
IEEE Trans Cybern. 2020 Nov;50(11):4808-4820. doi: 10.1109/TCYB.2019.2934986. Epub 2019 Aug 30.
10
DANet: Semi-supervised differentiated auxiliaries guided network for video action recognition.DANet:用于视频动作识别的半监督差异化辅助引导网络。
Neural Netw. 2023 Jan;158:121-131. doi: 10.1016/j.neunet.2022.11.009. Epub 2022 Nov 17.