• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学习多模态非线性嵌入:性能界限与一种算法

Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm.

作者信息

Kaya Semih, Vural Elif

出版信息

IEEE Trans Image Process. 2021;30:4384-4394. doi: 10.1109/TIP.2021.3071688. Epub 2021 Apr 21.

DOI:10.1109/TIP.2021.3071688
PMID:33848248
Abstract

While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the interpolators. Experimental comparison to recent multi-modal and single-modal learning algorithms suggests that the proposed method yields promising performance in multi-modal image classification and cross-modal image-text retrieval applications.

摘要

虽然文献中存在许多方法来学习多模态数据集合的低维表示,但多模态非线性嵌入对以前未见过的数据的可推广性是一个相当被忽视的主题。在这项工作中,我们首先对监督设置下学习多模态非线性嵌入进行了理论分析。我们的性能界限表明,对于多模态分类和检索问题中的成功泛化,将嵌入扩展到整个数据空间的插值函数的正则性与类间分离和跨模态对齐标准同样重要。然后,我们提出了一种受这些理论发现启发的多模态非线性表示学习算法,其中训练样本的嵌入与插值器的利普希茨正则性联合优化。与最近的多模态和单模态学习算法的实验比较表明,该方法在多模态图像分类和跨模态图像-文本检索应用中产生了有前景的性能。

相似文献

1
Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm.学习多模态非线性嵌入:性能界限与一种算法
IEEE Trans Image Process. 2021;30:4384-4394. doi: 10.1109/TIP.2021.3071688. Epub 2021 Apr 21.
2
Hypergraph-Based Multi-Modal Representation for Open-Set 3D Object Retrieval.基于超图的开放集3D物体检索多模态表示
IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2206-2223. doi: 10.1109/TPAMI.2023.3332768. Epub 2024 Mar 6.
3
ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion Map.模态合唱:通过模态融合映射对多模态嵌入进行视觉探测与对齐
IEEE Trans Vis Comput Graph. 2025 Jan;31(1):294-304. doi: 10.1109/TVCG.2024.3456387. Epub 2024 Nov 25.
4
Deep Relation Embedding for Cross-Modal Retrieval.深度关系嵌入的跨模态检索。
IEEE Trans Image Process. 2021;30:617-627. doi: 10.1109/TIP.2020.3038354. Epub 2020 Dec 1.
5
Generalized Multi-View Embedding for Visual Recognition and Cross-Modal Retrieval.用于视觉识别和跨模态检索的广义多视图嵌入。
IEEE Trans Cybern. 2018 Sep;48(9):2542-2555. doi: 10.1109/TCYB.2017.2742705. Epub 2017 Sep 6.
6
Joint Feature Synthesis and Embedding: Adversarial Cross-Modal Retrieval Revisited.联合特征合成与嵌入:重新审视对抗性跨模态检索
IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):3030-3047. doi: 10.1109/TPAMI.2020.3045530. Epub 2022 May 5.
7
Cross-modal distribution alignment embedding network for generalized zero-shot learning.跨模态分布对齐嵌入网络的广义零样本学习。
Neural Netw. 2022 Apr;148:176-182. doi: 10.1016/j.neunet.2022.01.007. Epub 2022 Jan 29.
8
Fine-Grained Cross-Modal Semantic Consistency in Natural Conservation Image Data from a Multi-Task Perspective.从多任务视角看自然保护图像数据中的细粒度跨模态语义一致性
Sensors (Basel). 2024 May 14;24(10):3130. doi: 10.3390/s24103130.
9
Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval.通过图表示学习弥合多媒体异质鸿沟进行跨模态检索。
Neural Netw. 2021 Feb;134:143-162. doi: 10.1016/j.neunet.2020.11.011. Epub 2020 Nov 28.
10
Collective Reconstructive Embeddings for Cross-modal Hashing.用于跨模态哈希的集体重构嵌入
IEEE Trans Image Process. 2018 Dec 28. doi: 10.1109/TIP.2018.2890144.