Urooj Bushra, Fayaz Muhammad, Ali Shafqat, Dang L Minh, Kim Kyung Won
Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
Department of Computer Science and Engineering, Sejong University, Seoul 05006, Republic of Korea.
Bioengineering (Basel). 2025 Jul 29;12(8):818. doi: 10.3390/bioengineering12080818.
The integration of vision and language processing into a cohesive system has already shown promise with the application of large language models (LLMs) in medical image analysis. Their capabilities encompass the generation of medical reports, disease classification, visual question answering, and segmentation, providing yet another approach to interpreting multimodal data. This survey aims to compile all known applications of LLMs in the medical image analysis field, spotlighting their promises alongside critical challenges and future avenues. We introduce the concept of X-stage tuning which serves as a framework for LLMs fine-tuning across multiple stages: zero stage, one stage, and multi-stage, wherein each stage corresponds to task complexity and available data. The survey describes issues like sparsity of data, hallucination in outputs, privacy issues, and the requirement for dynamic knowledge updating. Alongside these, we cover prospective features including integration of LLMs with decision support systems, multimodal learning, and federated learning for privacy-preserving model training. The goal of this work is to provide structured guidance to the targeted audience, demystifying the prospects of LLMs in medical image analysis.
将视觉和语言处理集成到一个连贯的系统中,在医学图像分析中应用大语言模型(LLMs)已展现出前景。它们的能力包括生成医学报告、疾病分类、视觉问答和分割,为解释多模态数据提供了另一种方法。本综述旨在汇编LLMs在医学图像分析领域的所有已知应用,突出其前景以及关键挑战和未来方向。我们引入了X阶段微调的概念,它作为LLMs跨多个阶段进行微调的框架:零阶段、一阶段和多阶段,其中每个阶段对应任务复杂性和可用数据。该综述描述了数据稀疏、输出幻觉、隐私问题以及动态知识更新需求等问题。除此之外,我们还涵盖了前瞻性特征,包括LLMs与决策支持系统的集成、多模态学习以及用于隐私保护模型训练的联邦学习。这项工作的目标是为目标受众提供结构化指导,揭开LLMs在医学图像分析中的前景之谜。