• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型辅助脑磁共振成像鉴别诊断中的人机协作:一项可用性研究

Human-AI collaboration in large language model-assisted brain MRI differential diagnosis: a usability study.

作者信息

Kim Su Hwan, Wihl Jonas, Schramm Severin, Berberich Cornelius, Rosenkranz Enrike, Schmitzer Lena, Serguen Kerem, Klenk Christopher, Lenhart Nicolas, Zimmer Claus, Wiestler Benedikt, Hedderich Dennis M

机构信息

Department of Diagnostic and Interventional Neuroradiology, Klinikum rechts der Isar, School of Medicine and Health, Technical University Munich, Munich, Germany.

Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, School of Medicine and Health, Technical University Munich, Munich, Germany.

出版信息

Eur Radiol. 2025 Mar 7. doi: 10.1007/s00330-025-11484-6.

DOI:10.1007/s00330-025-11484-6
PMID:40055233
Abstract

OBJECTIVES

This study investigated the impact of human-large language model (LLM) collaboration on the accuracy and efficiency of brain MRI differential diagnosis.

MATERIALS AND METHODS

In this retrospective study, forty brain MRI cases with a challenging but definitive diagnosis were randomized into two groups of twenty cases each. Six radiology residents with an average experience of 6.3 months in reading brain MRI exams evaluated one set of cases supported by conventional internet search (Conventional) and the other set utilizing an LLM-based search engine and hybrid chatbot. A cross-over design ensured that each case was examined with both workflows in equal frequency. For each case, readers were instructed to determine the three most likely differential diagnoses. LLM responses were analyzed by a panel of radiologists. Benefits and challenges in human-LLM interaction were derived from observations and participant feedback.

RESULTS

LLM-assisted brain MRI differential diagnosis yielded superior accuracy (70/114; 61.4% (LLM-assisted) vs 53/114; 46.5% (conventional) correct diagnoses, p = 0.033, chi-square test). No difference in interpretation time or level of confidence was observed. An analysis of LLM responses revealed that correct LLM suggestions translated into correct reader responses in 82.1% of cases (60/73). Inaccurate case descriptions by readers (9.2% of cases), LLM hallucinations (11.5% of cases), and insufficient contextualization of LLM responses were identified as challenges related to human-LLM interaction.

CONCLUSION

Human-LLM collaboration has the potential to improve brain MRI differential diagnosis. Yet, several challenges must be addressed to ensure effective adoption and user acceptance.

KEY POINTS

Question While large language models (LLM) have the potential to support radiological differential diagnosis, the role of human-LLM collaboration in this context remains underexplored. Findings LLM-assisted brain MRI differential diagnosis yielded superior accuracy over conventional internet search. Inaccurate case descriptions, LLM hallucinations, and insufficient contextualization were identified as potential challenges. Clinical relevance Our results highlight the potential of an LLM-assisted workflow to increase diagnostic accuracy but underline the necessity to study collaborative efforts between humans and LLMs over LLMs in isolation.

摘要

目的

本研究调查了人类与大语言模型(LLM)协作对脑MRI鉴别诊断准确性和效率的影响。

材料与方法

在这项回顾性研究中,40例具有挑战性但诊断明确的脑MRI病例被随机分为两组,每组20例。6名平均有6.3个月阅读脑MRI检查经验的放射科住院医师评估了一组由传统互联网搜索支持的病例(传统组)和另一组使用基于LLM的搜索引擎和混合聊天机器人的病例。交叉设计确保每个病例以两种工作流程进行检查的频率相同。对于每个病例,要求读者确定三种最可能的鉴别诊断。LLM的回答由一组放射科医生进行分析。人类与LLM交互中的益处和挑战来自观察结果和参与者反馈。

结果

LLM辅助的脑MRI鉴别诊断产生了更高的准确性(70/114;61.4%(LLM辅助)对53/114;46.5%(传统组)正确诊断,p = 0.033,卡方检验)。在解读时间或信心水平上未观察到差异。对LLM回答的分析表明,在82.1%的病例(60/73)中,LLM的正确建议转化为读者的正确回答。读者对病例描述不准确(9.2%的病例)、LLM产生幻觉(11.5%的病例)以及LLM回答的背景信息不足被确定为与人类-LLM交互相关的挑战。

结论

人类与LLM协作有潜力改善脑MRI鉴别诊断。然而,必须解决几个挑战,以确保有效采用和用户接受。

关键点

问题 虽然大语言模型(LLM)有潜力支持放射学鉴别诊断,但人类与LLM协作在这种情况下的作用仍未得到充分探索。发现 LLM辅助的脑MRI鉴别诊断比传统互联网搜索具有更高的准确性。不准确的病例描述、LLM产生幻觉和背景信息不足被确定为潜在挑战。临床相关性 我们的结果突出了LLM辅助工作流程提高诊断准确性的潜力,但强调了研究人类与LLM之间协作努力而非单独研究LLM必要性。

相似文献

1
Human-AI collaboration in large language model-assisted brain MRI differential diagnosis: a usability study.大语言模型辅助脑磁共振成像鉴别诊断中的人机协作:一项可用性研究
Eur Radiol. 2025 Mar 7. doi: 10.1007/s00330-025-11484-6.
2
An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.用于肌肉骨骼磁共振成像的机构大语言模型可提高方案依从性和准确性。
J Bone Joint Surg Am. 2025 Jul 8. doi: 10.2106/JBJS.24.01429.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
[Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation].五种大语言模型在口腔辅助诊断、治疗及健康咨询领域的应用初探
Zhonghua Kou Qiang Yi Xue Za Zhi. 2025 Jul 30;60(8):871-878. doi: 10.3760/cma.j.cn112144-20241107-00418.
5
Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses.用于临床诊断的专用人工智能专家系统与具有大语言模型的生成式人工智能对比
JAMA Netw Open. 2025 May 1;8(5):e2512994. doi: 10.1001/jamanetworkopen.2025.12994.
6
Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.大型语言模型根据儿科病例的临床表现和影像学检查结果生成鉴别诊断的准确性。
Pediatr Radiol. 2025 Jul 12. doi: 10.1007/s00247-025-06317-z.
7
Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.大语言模型对诊断推理的影响:一项随机临床试验。
JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.
8
Advancing health coaching: A comparative study of large language model and health coaches.推进健康辅导:大型语言模型与健康辅导员的比较研究。
Artif Intell Med. 2024 Nov;157:103004. doi: 10.1016/j.artmed.2024.103004. Epub 2024 Oct 19.
9
Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.使用大语言模型进行乳腺影像报告和数据系统分类及恶性肿瘤预测以增强乳腺超声诊断:回顾性研究
JMIR Med Inform. 2025 Jun 11;13:e70924. doi: 10.2196/70924.
10
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型:以临床医生为重点的回顾与交互式指南
J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.

引用本文的文献

1
Enhancing Radiologist Productivity with Artificial Intelligence in Magnetic Resonance Imaging (MRI): A Narrative Review.利用人工智能提高磁共振成像(MRI)中放射科医生的工作效率:一篇叙述性综述。
Diagnostics (Basel). 2025 Apr 30;15(9):1146. doi: 10.3390/diagnostics15091146.
2
Clinical insights: A comprehensive review of language models in medicine.临床见解:医学领域语言模型的全面综述
PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.

本文引用的文献

1
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
2
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
3
Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.
比较基于 GPT-4 的 ChatGPT、基于 GPT-4V 的 ChatGPT 和放射科医生在神经放射学挑战性病例中的诊断性能。
Clin Neuroradiol. 2024 Dec;34(4):779-787. doi: 10.1007/s00062-024-01426-y. Epub 2024 May 28.
4
Performance of GPT-4 on the American College of Radiology In-training Examination: Evaluating Accuracy, Model Drift, and Fine-tuning.GPT-4 在美国放射学院实习考试中的表现:评估准确性、模型漂移和微调。
Acad Radiol. 2024 Jul;31(7):3046-3054. doi: 10.1016/j.acra.2024.04.006. Epub 2024 Apr 22.
5
The virtual reference radiologist: comprehensive AI assistance for clinical image reading and interpretation.虚拟参考放射科医生:临床影像阅读和解释的全面人工智能辅助。
Eur Radiol. 2024 Oct;34(10):6652-6666. doi: 10.1007/s00330-024-10727-2. Epub 2024 Apr 16.
6
Radiological Differential Diagnoses Based on Cardiovascular and Thoracic Imaging Patterns: Perspectives of Four Large Language Models.基于心血管和胸部影像模式的放射学鉴别诊断:四种大语言模型的视角
Indian J Radiol Imaging. 2023 Dec 28;34(2):269-275. doi: 10.1055/s-0043-1777289. eCollection 2024 Apr.
7
Charting new AI education in gastroenterology: Cross-sectional evaluation of ChatGPT and perplexity AI in medical residency exam.绘制胃肠病学新的人工智能教育图表:ChatGPT 和 perplexity AI 在医学住院医师考试中的横断面评估。
Dig Liver Dis. 2024 Aug;56(8):1304-1311. doi: 10.1016/j.dld.2024.02.019. Epub 2024 Mar 19.
8
Accuracy of ChatGPT generated diagnosis from patient's medical history and imaging findings in neuroradiology cases.ChatGPT根据患者病史和影像学检查结果对神经放射学病例进行诊断的准确性。
Neuroradiology. 2024 Jan;66(1):73-79. doi: 10.1007/s00234-023-03252-4. Epub 2023 Nov 23.
9
Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.评估人工智能聊天机器人对癌症热门搜索查询的响应
JAMA Oncol. 2023 Oct 1;9(10):1437-1440. doi: 10.1001/jamaoncol.2023.2947.
10
Creation and Adoption of Large Language Models in Medicine.医学领域中大型语言模型的创建与采用。
JAMA. 2023 Sep 5;330(9):866-869. doi: 10.1001/jama.2023.14217.