自监督多模态学习：一项综述。

Self-Supervised Multimodal Learning: A Survey.

作者信息

Zong Yongshuo, Aodha Oisin Mac, Hospedales Timothy M

出版信息

IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5299-5318. doi: 10.1109/TPAMI.2024.3429301.

DOI:10.1109/TPAMI.2024.3429301

Abstract

Multimodal learning, which aims to understand and analyze information from multiple modalities, has achieved substantial progress in the supervised regime in recent years. However, the heavy dependence on data paired with expensive human annotations impedes scaling up models. Meanwhile, given the availability of large-scale unannotated data in the wild, self-supervised learning has become an attractive strategy to alleviate the annotation bottleneck. Building on these two directions, self-supervised multimodal learning (SSML) provides ways to learn from raw multimodal data. In this survey, we provide a comprehensive review of the state-of-the-art in SSML, in which we elucidate three major challenges intrinsic to self-supervised learning with multimodal data: 1) learning representations from multimodal data without labels, 2) fusion of different modalities, and 3) learning with unaligned data. We then detail existing solutions to these challenges. Specifically, we consider 1) objectives for learning from multimodal unlabeled data via self-supervision, 2) model architectures from the perspective of different multimodal fusion strategies, and 3) pair-free learning strategies for coarse-grained and fine-grained alignment. We also review real-world applications of SSML algorithms in diverse fields, such as healthcare, remote sensing, and machine translation. Finally, we discuss challenges and future directions for SSML.

摘要

多模态学习旨在理解和分析来自多种模态的信息，近年来在监督学习领域取得了显著进展。然而，对数据的严重依赖以及昂贵的人工标注阻碍了模型的扩展。与此同时，鉴于大量未标注的自然数据的可用性，自监督学习已成为缓解标注瓶颈的一种有吸引力的策略。基于这两个方向，自监督多模态学习（SSML）提供了从原始多模态数据中学习的方法。在本次综述中，我们对SSML的最新进展进行了全面回顾，阐明了使用多模态数据进行自监督学习所固有的三个主要挑战：1）从无标签的多模态数据中学习表示；2）不同模态的融合；3）处理未对齐数据的学习。然后，我们详细介绍了针对这些挑战的现有解决方案。具体而言，我们考虑：1）通过自监督从多模态无标签数据中学习的目标；2）从不同多模态融合策略角度出发的模型架构；3）用于粗粒度和细粒度对齐的无配对学习策略。我们还回顾了SSML算法在医疗保健、遥感和机器翻译等不同领域的实际应用。最后，我们讨论了SSML面临的挑战和未来发展方向。

相似文献

Self-Supervised Multimodal Learning: A Survey.自监督多模态学习：一项综述。

IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5299-5318. doi: 10.1109/TPAMI.2024.3429301.

Boundary-aware information maximization for self-supervised medical image segmentation.用于自监督医学图像分割的边界感知信息最大化

Med Image Anal. 2024 May;94:103150. doi: 10.1016/j.media.2024.103150. Epub 2024 Mar 28.

Semi-supervised learning from small annotated data and large unlabeled data for fine-grained Participants, Intervention, Comparison, and Outcomes entity recognition.从小规模标注数据和大规模未标注数据中进行半监督学习，用于细粒度的参与者、干预措施、对照和结果实体识别。

J Am Med Inform Assoc. 2025 Mar 1;32(3):555-565. doi: 10.1093/jamia/ocae326.

Semi-Supervised Learning Allows for Improved Segmentation With Reduced Annotations of Brain Metastases Using Multicenter MRI Data.半监督学习可利用多中心MRI数据，通过减少脑转移瘤的标注来改进分割。

J Magn Reson Imaging. 2025 Jun;61(6):2469-2479. doi: 10.1002/jmri.29686. Epub 2025 Jan 10.

A segment anything model-guided and match-based semi-supervised segmentation framework for medical imaging.一种用于医学成像的基于段式分割模型引导和匹配的半监督分割框架。

Med Phys. 2025 Mar 29. doi: 10.1002/mp.17785.

Influence of early through late fusion on pancreas segmentation from imperfectly registered multimodal magnetic resonance imaging.早期至晚期融合对来自配准不完善的多模态磁共振成像的胰腺分割的影响。

J Med Imaging (Bellingham). 2025 Mar;12(2):024008. doi: 10.1117/1.JMI.12.2.024008. Epub 2025 Apr 26.

Short-Term Memory Impairment短期记忆障碍

Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis.利用扩散模型探索基于脑电图信号的图像生成潜力：结合混合方法和多模态分析的综合框架

JMIR Med Inform. 2025 Jun 25;13:e72027. doi: 10.2196/72027.

Self-Supervised Contrastive Learning for Medical Time Series: A Systematic Review.自监督对比学习在医学时间序列中的应用：系统综述。

Sensors (Basel). 2023 Apr 23;23(9):4221. doi: 10.3390/s23094221.

Long-term care plan recommendation for older adults with disabilities: a bipartite graph transformer and self-supervised approach.针对残疾老年人的长期护理计划建议：一种二分图变压器和自监督方法。

J Am Med Inform Assoc. 2025 Apr 1;32(4):689-701. doi: 10.1093/jamia/ocae327.

引用本文的文献

Improvement of mask R-CNN and deep learning for defect detection and segmentation in electronic products.用于电子产品缺陷检测与分割的Mask R-CNN及深度学习改进

PLoS One. 2025 Sep 8;20(9):e0329945. doi: 10.1371/journal.pone.0329945. eCollection 2025.

Leveraging neuroinformatics to understand cognitive phenotypes in elite athletes through systems neuroscience.利用神经信息学，通过系统神经科学了解精英运动员的认知表型。

Front Neuroinform. 2025 Aug 19;19:1557879. doi: 10.3389/fninf.2025.1557879. eCollection 2025.

Multimodal learning for enhanced SPECT/CT imaging in sports injury diagnosis.用于运动损伤诊断中增强型SPECT/CT成像的多模态学习

Front Physiol. 2025 Jul 29;16:1605426. doi: 10.3389/fphys.2025.1605426. eCollection 2025.

Cross modality learning of cell painting and transcriptomics data improves mechanism of action clustering and bioactivity modelling.细胞成像和转录组学数据的跨模态学习改善了作用机制聚类和生物活性建模。

Sci Rep. 2025 Jul 2;15(1):23010. doi: 10.1038/s41598-025-05914-0.

Advancing Drug Discovery with Enhanced Chemical Understanding via Asymmetric Contrastive Multimodal Learning.通过不对称对比多模态学习增强化学理解以推进药物发现

J Chem Inf Model. 2025 Jul 14;65(13):6547-6557. doi: 10.1021/acs.jcim.5c00430. Epub 2025 Jun 23.

Legal innovations for balancing environmental protection and public health in urban polluted areas.城市污染地区平衡环境保护与公众健康的法律创新。

Front Public Health. 2025 May 30;13:1557173. doi: 10.3389/fpubh.2025.1557173. eCollection 2025.

Enhancing gastroenterology with multimodal learning: the role of large language model chatbots in digestive endoscopy.通过多模态学习提升胃肠病学：大语言模型聊天机器人在消化内镜检查中的作用

Front Med (Lausanne). 2025 May 21;12:1583514. doi: 10.3389/fmed.2025.1583514. eCollection 2025.

Combining spatial transcriptomics with tissue morphology.将空间转录组学与组织形态学相结合。

Nat Commun. 2025 May 13;16(1):4452. doi: 10.1038/s41467-025-58989-8.

ModuCLIP: multi-scale CLIP framework for predicting foundation pit deformation in multi-modal robotic systems.ModuCLIP：用于多模态机器人系统中预测基坑变形的多尺度CLIP框架。

Front Neurorobot. 2025 Apr 1;19:1544694. doi: 10.3389/fnbot.2025.1544694. eCollection 2025.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

自监督多模态学习：一项综述。

Self-Supervised Multimodal Learning: A Survey.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献