Noh Seungha, Lee Byoung-Dai
Department of Computer Science, Graduate School, Kyonggi University, Suwon, Republic of Korea.
Quant Imaging Med Surg. 2025 Jun 6;15(6):5825-5858. doi: 10.21037/qims-2024-2826. Epub 2025 Jun 3.
Foundation models are deep learning models pretrained on extensive datasets, equipped with the ability to adapt to a variety of downstream tasks. Recently, they have gained prominence across various domains, including medical imaging. These models exhibit remarkable contextual understanding and generalization capabilities, spurring active research in healthcare to develop versatile artificial intelligence solutions for real-world clinical environments. Inspired by this, this study offers a comprehensive review of foundation models in medical image segmentation (MIS), evaluates their zero-shot performance on diverse datasets, and assesses their practical applicability in clinical settings.
A total of 63 studies on foundation models for MIS were systematically reviewed, utilizing platforms such as arXiv, ResearchGate, Google Scholar, Semantic Scholar, and PubMed. Additionally, we curated 31 unseen medical image datasets from The Cancer Imaging Archive (TCIA), Kaggle, Zenodo, Institute of Electrical and Electronics Engineers (IEEE) DataPort, and Grand Challenge to evaluate the zero-shot performance of six foundation models. Performance analysis was conducted from various perspectives, including modality and anatomical structure.
Foundation models were categorized based on a taxonomy that incorporates criteria such as data dimensions, modality coverage, prompt type, and training strategy. Furthermore, the zero-shot evaluation revealed key insights into their strengths and limitations across diverse imaging modalities. This analysis underscores the potential of these models in MIS while highlighting areas for improvement to optimize real-world applications.
Our findings provide a valuable resource for understanding the role of foundation models in MIS. By identifying their capabilities and limitations, this review lays the groundwork for advancing their practical deployment in clinical environments, supporting further innovation in medical image analysis.
基础模型是在大量数据集上预训练的深度学习模型,具备适应各种下游任务的能力。近年来,它们在包括医学成像在内的各个领域崭露头角。这些模型展现出卓越的上下文理解和泛化能力,促使医疗领域积极开展研究,以开发适用于现实临床环境的通用人工智能解决方案。受此启发,本研究全面综述了医学图像分割(MIS)中的基础模型,评估它们在不同数据集上的零样本性能,并评估它们在临床环境中的实际适用性。
利用诸如arXiv、ResearchGate、谷歌学术、语义学者和PubMed等平台,对总共63项关于MIS基础模型的研究进行了系统综述。此外,我们从癌症成像存档(TCIA)、Kaggle、Zenodo、电气和电子工程师协会(IEEE)数据端口以及大挑战等平台挑选了31个未见医学图像数据集,以评估6个基础模型的零样本性能。从模态和解剖结构等多个角度进行了性能分析。
基础模型根据一种分类法进行分类,该分类法纳入了数据维度、模态覆盖范围、提示类型和训练策略等标准。此外,零样本评估揭示了它们在不同成像模态下的优势和局限性的关键见解。这一分析强调了这些模型在MIS中的潜力,同时突出了需要改进以优化实际应用的领域。
我们的研究结果为理解基础模型在MIS中的作用提供了宝贵资源。通过识别它们的能力和局限性,本综述为推进其在临床环境中的实际部署奠定了基础,支持医学图像分析的进一步创新。