• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CT和MRI检查前用于患者教育的多种先进大语言模型的比较

Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations.

作者信息

Eminovic Semil, Levita Bogdan, Dell'Orco Andrea, Leppig Jonas Alexander, Nawabi Jawed, Penzkofer Tobias

机构信息

Department of Radiology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany.

Department of Neuroradiology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany.

出版信息

J Pers Med. 2025 Jun 5;15(6):235. doi: 10.3390/jpm15060235.

DOI:10.3390/jpm15060235
PMID:40559098
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12194482/
Abstract

: This study compares the accuracy of responses from state-of-the-art large language models (LLMs) to patient questions before CT and MRI imaging. We aim to demonstrate the potential of LLMs in improving workflow efficiency, while also highlighting risks such as misinformation. : There were 57 CT-related and 64 MRI-related patient questions displayed to ChatGPT-4o, Claude 3.5 Sonnet, Google Gemini, and Mistral Large 2. Each answer was evaluated by two board-certified radiologists and scored for accuracy/correctness/likelihood to mislead using a 5-point Likert scale. Statistics compared LLM performance across question categories. : ChatGPT-4o achieved the highest average scores for CT-related questions and tied with Claude 3.5 Sonnet for MRI-related questions, with higher scores across all models for MRI (ChatGPT-4o: CT [4.52 (± 0.46)], MRI: [4.79 (± 0.37)]; Google Gemini: CT [4.44 (± 0.58)]; MRI [4.68 (± 0.58)]; Claude 3.5 Sonnet: CT [4.40 (± 0.59)]; MRI [4.79 (± 0.37)]; Mistral Large 2: CT [4.25 (± 0.54)]; MRI [4.74 (± 0.47)]). At least one response per LLM was rated as inaccurate, with Google Gemini answering most often potentially misleading (in 5.26% for CT and 2.34% for MRI). Mistral Large 2 was outperformed by ChatGPT-4o for all CT-related questions ( < 0.001) and by ChatGPT-4o ( = 0.003), Google Gemini ( = 0.022), and Claude 3.5 Sonnet ( = 0.004) for all CT Contrast media information questions. : Even though all LLMs performed well overall and showed great potential for patient education, each model occasionally displayed potentially misleading information, highlighting the clinical application risk.

摘要

本研究比较了最先进的大语言模型(LLMs)对CT和MRI成像前患者问题的回答准确性。我们旨在证明大语言模型在提高工作流程效率方面的潜力,同时也强调错误信息等风险。向ChatGPT-4o、Claude 3.5 Sonnet、谷歌Gemini和米斯特拉尔大模型2展示了57个与CT相关和64个与MRI相关的患者问题。每个答案由两名获得委员会认证的放射科医生进行评估,并使用5点李克特量表对准确性/正确性/误导可能性进行评分。统计数据比较了各问题类别中大语言模型的表现。ChatGPT-4o在与CT相关的问题上获得了最高平均分,在与MRI相关的问题上与Claude 3.5 Sonnet并列,在所有模型中MRI的得分更高(ChatGPT-4o:CT[4.52(±0.46)],MRI:[4.79(±0.37)];谷歌Gemini:CT[4.44(±0.58)];MRI[4.68(±0.58)];Claude 3.5 Sonnet:CT[4.40(±0.59)];MRI[4.79(±0.37)];米斯特拉尔大模型2:CT[4.25(±0.54)];MRI[4.74(±0.47)])。每个大语言模型至少有一个回答被评为不准确,谷歌Gemini回答最常具有潜在误导性(CT为5.26%,MRI为2.34%)。在所有与CT相关的问题上,米斯特拉尔大模型2的表现均不如ChatGPT-4o(<0.001),在所有CT造影剂信息问题上,米斯特拉尔大模型2不如ChatGPT-4o(=0.003)、谷歌Gemini(=0.022)和Claude 3.5 Sonnet(=0.004)。尽管所有大语言模型总体表现良好,在患者教育方面显示出巨大潜力,但每个模型偶尔都会显示出潜在的误导性信息,突出了临床应用风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/d99ed6a0e756/jpm-15-00235-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/005b4a0fb05b/jpm-15-00235-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/d99ed6a0e756/jpm-15-00235-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/005b4a0fb05b/jpm-15-00235-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/d99ed6a0e756/jpm-15-00235-g002.jpg

相似文献

1
Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations.CT和MRI检查前用于患者教育的多种先进大语言模型的比较
J Pers Med. 2025 Jun 5;15(6):235. doi: 10.3390/jpm15060235.
2
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性:横断面研究。
J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.
3
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。
Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.
4
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.大语言模型在回答牙周根分叉病变管理临床问题中的性能评估
Dent J (Basel). 2025 Jun 18;13(6):271. doi: 10.3390/dj13060271.
5
Evaluation of Vision-Language Models for Detection and Deidentification of Medical Images with Burned-In Protected Health Information.用于检测和去识别带有预嵌入受保护健康信息的医学图像的视觉语言模型评估
Radiology. 2025 Jun;315(3):e243664. doi: 10.1148/radiol.243664.
6
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
7
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用:Claude、GPT和Gemini的比较研究
JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.
8
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响
Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.
9
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
10
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.使用大语言模型对黄蜂蜇伤进行临床管理:横断面评估研究
J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.

本文引用的文献

1
Assessing the performance of AI chatbots in answering patients' common questions about low back pain.评估人工智能聊天机器人回答患者关于腰痛常见问题的表现。
Ann Rheum Dis. 2025 Jan;84(1):143-149. doi: 10.1136/ard-2024-226202. Epub 2025 Jan 2.
2
Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.大语言模型在白内障护理信息提供方面的评估:定量比较
Ophthalmol Ther. 2025 Jan;14(1):103-116. doi: 10.1007/s40123-024-01066-y. Epub 2024 Nov 8.
3
Probing clarity: AI-generated simplified breast imaging reports for enhanced patient comprehension powered by ChatGPT-4o.
探索清晰度:由 ChatGPT-4o 提供支持的人工智能生成的简化乳腺成像报告,以增强患者理解。
Eur Radiol Exp. 2024 Oct 30;8(1):124. doi: 10.1186/s41747-024-00526-1.
4
Accuracy of ChatGPT responses on tracheotomy for patient education.ChatGPT 回答在患者教育中关于气管切开术的准确性。
Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6167-6172. doi: 10.1007/s00405-024-08859-8. Epub 2024 Oct 2.
5
"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study.“医生 ChatGPT,你能帮我吗?”患者视角:横断面研究。
J Med Internet Res. 2024 Oct 1;26:e58831. doi: 10.2196/58831.
6
Preoperative Patient Guidance and Education in Aesthetic Breast Plastic Surgery: A Novel Proposed Application of Artificial Intelligence Large Language Models.美容乳房整形手术中的术前患者指导与教育:人工智能大语言模型的一种新型拟用应用
Aesthet Surg J Open Forum. 2024 Aug 13;6:ojae062. doi: 10.1093/asjof/ojae062. eCollection 2024.
7
Evaluating the effectiveness of large language models in patient education for conjunctivitis.评估大语言模型在结膜炎患者教育中的有效性。
Br J Ophthalmol. 2025 Jan 28;109(2):185-191. doi: 10.1136/bjo-2024-325599.
8
Comparing ChatGPT and a Single Anesthesiologist's Responses to Common Patient Questions: An Exploratory Cross-Sectional Survey of a Panel of Anesthesiologists.比较 ChatGPT 和一位麻醉医生对常见患者问题的回答:对一组麻醉医生进行的探索性横断面调查。
J Med Syst. 2024 Aug 22;48(1):77. doi: 10.1007/s10916-024-02100-z.
9
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较:大型语言模型、ChatGPT 和未经训练的急诊医生:一项对比研究。
J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.
10
Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting.基于生成式人工智能的以患者为中心的放射科报告:为放射科报告增添价值。
Sci Rep. 2024 Jun 8;14(1):13218. doi: 10.1038/s41598-024-63824-z.