Schiaffino Simone, Zhang Tianyu, Mann Ritse M, Pinker Katja
Imaging Institute of Southern Switzerland (IIMSI), Ente Ospedaliero Cantonale (EOC), Lugano, Switzerland.
Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland.
J Magn Reson Imaging. 2025 May 4. doi: 10.1002/jmri.29807.
This narrative review focuses on the integration of large language models (LLMs), such as GPT-4 and Gemini, into breast imaging. LLMs excel in understanding, processing, and generating human-like text, with potential applications ranging widely from decision-making to radiology reporting support. LLMs show promise in addressing current critical challenges, including rising demands for imaging services concurrent with an increasing shortage in the radiologist workforce. Their ability to integrate clinical guidelines and generate standardized, evidence-based reports has the potential to improve diagnostic consistency and reduce inter-reader variability. Emerging multimodal capabilities further extend their utility, enabling the integration of textual and visual data for tasks such as tumor classification and decision-making. Despite these advancements, significant challenges remain. LLMs often suffer from limitations such as hallucinations, biases in training datasets, and domain-specific knowledge gaps. These issues can affect their reliability, particularly in nuanced tasks like Breast Imaging Reporting and Data System categorization and multimodal image assessment. Moreover, ethical concerns about data privacy, biased outputs, and regulatory compliance must be addressed before effective deployment in the clinical setting. Current studies suggest that while LLMs can complement human expertise, their performance still lags behind that of radiologists in key areas, particularly in tasks requiring complex medical reasoning or direct image analysis. Looking ahead, LLMs are poised to play a crucial role in breast imaging by optimizing workflows, supporting multidisciplinary meetings, and improving patient education. However, their successful integration will depend on proper context training, robust validation, and ethical oversight, with human supervision as a crucial safeguard. EVIDENCE LEVEL: 5. TECHNICAL EFFICACY: Stage 2.
本叙述性综述聚焦于将大型语言模型(LLMs),如GPT-4和Gemini,整合到乳腺成像中。大型语言模型在理解、处理和生成类人文本方面表现出色,其潜在应用范围广泛,从决策制定到放射学报告支持。大型语言模型在应对当前的关键挑战方面显示出潜力,包括对成像服务需求的不断增加,同时放射科医生劳动力短缺问题日益严重。它们整合临床指南并生成标准化、基于证据的报告的能力,有可能提高诊断一致性并减少阅片者之间的差异。新兴的多模态能力进一步扩展了它们的效用,能够将文本和视觉数据整合用于肿瘤分类和决策等任务。尽管取得了这些进展,但仍存在重大挑战。大型语言模型常常存在诸如幻觉、训练数据集中的偏差以及特定领域知识差距等局限性。这些问题会影响它们的可靠性,尤其是在乳腺影像报告和数据系统分类以及多模态图像评估等细微任务中。此外,在临床环境中有效部署之前,必须解决数据隐私、有偏差的输出以及监管合规等伦理问题。当前研究表明,虽然大型语言模型可以补充人类专业知识,但它们在关键领域的表现仍落后于放射科医生,特别是在需要复杂医学推理或直接图像分析的任务中。展望未来,大型语言模型有望通过优化工作流程、支持多学科会议以及改善患者教育,在乳腺成像中发挥关键作用。然而,它们的成功整合将取决于适当的情境训练、强有力的验证以及伦理监督,而人类监督是至关重要的保障。证据水平:5。技术效能:第2阶段。