基于高斯过程潜变量模型的协调多模态学习。

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):858-872. doi: 10.1109/TPAMI.2019.2942028. Epub 2021 Feb 4.

DOI:10.1109/TPAMI.2019.2942028

Abstract

Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval. This paper attempts to address the modality heterogeneity problem based on Gaussian process latent variable models (GPLVMs) to represent multimodal data in a common space. Previous multimodal GPLVM extensions generally adopt individual learning schemes on latent representations and kernel hyperparameters, which ignore their intrinsic relationship. To exploit strong complementarity among different modalities and GPLVM components, we develop a novel learning scheme called Harmonization, where latent representations and kernel hyperparameters are jointly learned from each other. Beyond the correlation fitting or intra-modal structure preservation paradigms widely used in existing studies, the harmonization is derived in a model-driven manner to encourage the agreement between modality-specific GP kernels and the similarity of latent representations. We present a range of multimodal learning models by incorporating the harmonization mechanism into several representative GPLVM-based approaches. Experimental results on four benchmark datasets show that the proposed models outperform the strong baselines for cross-modal retrieval tasks, and that the harmonized multimodal learning method is superior in discovering semantically consistent latent representation.

摘要

多模态学习旨在发现多种模态之间的关系。由于跨模态检索等广泛的多模态应用，它已成为一个重要的研究课题。本文试图基于高斯过程潜在变量模型（GPLVM）解决模态异质性问题，以便在公共空间中表示多模态数据。以前的多模态 GPLVM 扩展通常在潜在表示和核超参数上采用单独的学习方案，这忽略了它们的内在关系。为了利用不同模态和 GPLVM 组件之间的强大互补性，我们开发了一种新的学习方案，称为协调，其中潜在表示和核超参数彼此共同学习。与现有研究中广泛使用的相关性拟合或模态内结构保持范式不同，协调是从模型驱动的角度推导出来的，以鼓励特定于模态的 GP 核之间的一致性和潜在表示的相似性。我们通过将协调机制纳入几种基于代表性 GPLVM 的方法中，提出了一系列多模态学习模型。在四个基准数据集上的实验结果表明，所提出的模型在跨模态检索任务中优于强大的基线，并且协调的多模态学习方法在发现语义一致的潜在表示方面更具优势。

相似文献

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models.

IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):858-872. doi: 10.1109/TPAMI.2019.2942028. Epub 2021 Feb 4.

IEEE Trans Image Process. 2017 Sep;26(9):4168-4181. doi: 10.1109/TIP.2017.2713045. Epub 2017 Jun 7.

Integrating Multi-Label Contrastive Learning With Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval.

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4794-4811. doi: 10.1109/TPAMI.2022.3188547. Epub 2023 Mar 7.

Shared Linear Encoder-Based Multikernel Gaussian Process Latent Variable Model for Visual Classification.

IEEE Trans Cybern. 2021 Feb;51(2):534-547. doi: 10.1109/TCYB.2019.2915789. Epub 2021 Jan 15.

Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval.

Neural Netw. 2021 Feb;134:143-162. doi: 10.1016/j.neunet.2020.11.011. Epub 2020 Nov 28.

Harmonization Shared Autoencoder Gaussian Process Latent Variable Model With Relaxed Hamming Distance.

IEEE Trans Neural Netw Learn Syst. 2021 Nov;32(11):5093-5107. doi: 10.1109/TNNLS.2020.3026876. Epub 2021 Oct 27.

Cross-modal learning to rank via latent joint representation.

IEEE Trans Image Process. 2015 May;24(5):1497-509. doi: 10.1109/TIP.2015.2403240. Epub 2015 Feb 12.

Supervised latent linear Gaussian process latent variable model for dimensionality reduction.

IEEE Trans Syst Man Cybern B Cybern. 2012 Dec;42(6):1620-32. doi: 10.1109/TSMCB.2012.2196995. Epub 2012 May 17.

Multimodal Weibull Variational Autoencoder for Jointly Modeling Image-Text Data.

IEEE Trans Cybern. 2022 Oct;52(10):11156-11171. doi: 10.1109/TCYB.2021.3070881. Epub 2022 Sep 19.

SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval.

IEEE Trans Cybern. 2022 Feb;52(2):1086-1097. doi: 10.1109/TCYB.2020.2985716. Epub 2022 Feb 16.

引用本文的文献

Gaussian process linking functions for mind, brain, and behavior.

Proc Natl Acad Sci U S A. 2020 Nov 24;117(47):29398-29406. doi: 10.1073/pnas.1912342117.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于高斯过程潜变量模型的协调多模态学习。

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):858-872. doi: 10.1109/TPAMI.2019.2942028. Epub 2021 Feb 4.

DOI:10.1109/TPAMI.2019.2942028

PMID:31545710

Abstract

摘要

基于高斯过程潜变量模型的协调多模态学习。

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于高斯过程潜变量模型的协调多模态学习。

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models.

出版信息

相似文献

引用本文的文献