• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过对抗性判别模态蒸馏进行带特权信息的学习。

Learning with Privileged Information via Adversarial Discriminative Modality Distillation.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2581-2593. doi: 10.1109/TPAMI.2019.2929038. Epub 2019 Jul 16.

DOI:10.1109/TPAMI.2019.2929038
PMID:31331879
Abstract

Heterogeneous data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while training data can be accurately collected to include a variety of sensory modalities, it is often the case that not all of them are available in real life (testing) scenarios, where a model has to be deployed. This raises the challenge of how to extract information from multimodal data in the training stage, in a form that can be exploited at test time, considering limitations such as noisy or missing modalities. This paper presents a new approach in this direction for RGB-D vision tasks, developed within the adversarial learning and privileged information frameworks. We consider the practical case of learning representations from depth and RGB videos, while relying only on RGB data at test time. We propose a new approach to train a hallucination network that learns to distill depth information via adversarial learning, resulting in a clean approach without several losses to balance or hyperparameters. We report state-of-the-art results for object classification on the NYUD dataset, and video action recognition on the largest multimodal dataset available for this task, the NTU RGB+D, as well as on the Northwestern-UCLA.

摘要

异构数据模态可为多个任务提供补充线索,通常可以得到更稳健的算法和更好的性能。然而,尽管可以准确地收集训练数据以包含各种感觉模态,但在实际(测试)场景中,并非所有模态都可用,模型必须在这种场景中部署。这就提出了一个挑战,即如何在训练阶段以可在测试时利用的形式从多模态数据中提取信息,同时考虑到存在模态噪声或缺失的限制。本文针对 RGB-D 视觉任务提出了一种新的方法,该方法基于对抗学习和特权信息框架。我们考虑了从深度和 RGB 视频学习表示的实际情况,而仅在测试时依赖 RGB 数据。我们提出了一种新的训练幻觉网络的方法,该方法通过对抗学习学习提取深度信息,从而实现了一种无需平衡多个损失或超参数的简洁方法。我们在 NYUD 数据集上的对象分类和针对该任务可用的最大多模态数据集 NTU RGB+D 上的视频动作识别,以及 Northwestern-UCLA 上报告了最先进的结果。

相似文献

1
Learning with Privileged Information via Adversarial Discriminative Modality Distillation.通过对抗性判别模态蒸馏进行带特权信息的学习。
IEEE Trans Pattern Anal Mach Intell. 2020 Oct;42(10):2581-2593. doi: 10.1109/TPAMI.2019.2929038. Epub 2019 Jul 16.
2
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos.MMNet:一种基于模型的 RGB-D 视频人体动作识别多模态网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Mar;45(3):3522-3538. doi: 10.1109/TPAMI.2022.3177813. Epub 2023 Feb 3.
3
Discriminative Relational Representation Learning for RGB-D Action Recognition.用于RGB-D动作识别的判别关系表示学习
IEEE Trans Image Process. 2016 Jun;25(6):2856-2865. doi: 10.1109/TIP.2016.2556940. Epub 2016 Apr 20.
4
UTDNet: A unified triplet decoder network for multimodal salient object detection.UTDNet:一种用于多模态显著目标检测的统一三元解码器网络。
Neural Netw. 2024 Feb;170:521-534. doi: 10.1016/j.neunet.2023.11.051. Epub 2023 Nov 24.
5
Learning Effective RGB-D Representations for Scene Recognition.学习用于场景识别的有效RGB-D表示。
IEEE Trans Image Process. 2018 Sep 28. doi: 10.1109/TIP.2018.2872629.
6
Modality Compensation Network: Cross-Modal Adaptation for Action Recognition.模态补偿网络:用于动作识别的跨模态自适应
IEEE Trans Image Process. 2020 Jan 23. doi: 10.1109/TIP.2020.2967577.
7
Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation.自定步幅协作与对抗网络的无监督域自适应。
IEEE Trans Pattern Anal Mach Intell. 2021 Jun;43(6):2047-2061. doi: 10.1109/TPAMI.2019.2962476. Epub 2021 May 11.
8
Depth Privileged Scene Recognition via Dual Attention Hallucination.基于双注意力幻觉的深度特权场景识别。
IEEE Trans Image Process. 2021;30:9164-9178. doi: 10.1109/TIP.2021.3122955. Epub 2021 Nov 10.
9
Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos.基于 RGB+D 视频的深度多模态特征分析用于动作识别
IEEE Trans Pattern Anal Mach Intell. 2018 May;40(5):1045-1058. doi: 10.1109/TPAMI.2017.2691321. Epub 2017 Apr 5.
10
CMOS-GAN: Semi-Supervised Generative Adversarial Model for Cross-Modality Face Image Synthesis.CMOS-GAN:用于跨模态人脸图像合成的半监督生成对抗模型
IEEE Trans Image Process. 2023;32:144-158. doi: 10.1109/TIP.2022.3226413. Epub 2022 Dec 19.

引用本文的文献

1
Understanding action concepts from videos and brain activity through subjects' consensus.通过受试者的共识,从视频和大脑活动中理解动作概念。
Sci Rep. 2022 Nov 9;12(1):19073. doi: 10.1038/s41598-022-23067-2.