Huang Shih-Cheng, Jensen Malte, Yeung-Levy Serena, Lungren Matthew P, Poon Hoifung, Chaudhari Akshay S
Stanford University.
Microsoft Research.
Res Sq. 2025 Apr 28:rs.3.rs-5537908. doi: 10.21203/rs.3.rs-5537908/v1.
Artificial Intelligence (AI) holds immense potential to transform healthcare, yet progress is often hindered by the reliance on large labeled datasets and unimodal data. Multimodal Foundation Models (FMs), particularly those leveraging Self-Supervised Learning (SSL) on multimodal data, offer a paradigm shift towards label-efficient, holistic patient modeling. However, the rapid emergence of these complex models has created a fragmented landscape. Here, we provide a systematic review of multimodal FMs for medical imaging applications. Through rigorous screening of 1,144 publications (2012-2024) and in-depth analysis of 48 studies, we establish a unified terminology and comprehensively assess the current state-of-the-art. Our review aggregates current knowledge, critically identifies key limitations and underexplored opportunities, and culminates in actionable guidelines for researchers, clinicians, developers, and policymakers. This work provides a crucial roadmap to navigate and accelerate the responsible development and clinical translation of next-generation multimodal AI in healthcare.
人工智能(AI)在变革医疗保健方面具有巨大潜力,但进展往往因依赖大型标注数据集和单峰数据而受阻。多模态基础模型(FMs),特别是那些对多模态数据利用自监督学习(SSL)的模型,为向标签高效、整体的患者建模提供了范式转变。然而,这些复杂模型的迅速出现造成了一片零散的局面。在此,我们对用于医学成像应用的多模态FMs进行了系统综述。通过对1144篇出版物(2012 - 2024年)进行严格筛选,并对48项研究进行深入分析,我们建立了统一的术语,并全面评估了当前的技术水平。我们的综述汇总了当前知识,批判性地确定了关键限制和未充分探索的机会,并最终为研究人员、临床医生、开发者和政策制定者制定了可操作的指南。这项工作为在医疗保健领域负责任地开发和临床转化下一代多模态人工智能提供了至关重要的路线图,以引导并加速其发展。