文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

CT和MRI检查前用于患者教育的多种先进大语言模型的比较

Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations.

作者信息

Eminovic Semil, Levita Bogdan, Dell'Orco Andrea, Leppig Jonas Alexander, Nawabi Jawed, Penzkofer Tobias

机构信息

Department of Radiology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany.

Department of Neuroradiology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany.

出版信息

J Pers Med. 2025 Jun 5;15(6):235. doi: 10.3390/jpm15060235.


DOI:10.3390/jpm15060235
PMID:40559098
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12194482/
Abstract

: This study compares the accuracy of responses from state-of-the-art large language models (LLMs) to patient questions before CT and MRI imaging. We aim to demonstrate the potential of LLMs in improving workflow efficiency, while also highlighting risks such as misinformation. : There were 57 CT-related and 64 MRI-related patient questions displayed to ChatGPT-4o, Claude 3.5 Sonnet, Google Gemini, and Mistral Large 2. Each answer was evaluated by two board-certified radiologists and scored for accuracy/correctness/likelihood to mislead using a 5-point Likert scale. Statistics compared LLM performance across question categories. : ChatGPT-4o achieved the highest average scores for CT-related questions and tied with Claude 3.5 Sonnet for MRI-related questions, with higher scores across all models for MRI (ChatGPT-4o: CT [4.52 (± 0.46)], MRI: [4.79 (± 0.37)]; Google Gemini: CT [4.44 (± 0.58)]; MRI [4.68 (± 0.58)]; Claude 3.5 Sonnet: CT [4.40 (± 0.59)]; MRI [4.79 (± 0.37)]; Mistral Large 2: CT [4.25 (± 0.54)]; MRI [4.74 (± 0.47)]). At least one response per LLM was rated as inaccurate, with Google Gemini answering most often potentially misleading (in 5.26% for CT and 2.34% for MRI). Mistral Large 2 was outperformed by ChatGPT-4o for all CT-related questions ( < 0.001) and by ChatGPT-4o ( = 0.003), Google Gemini ( = 0.022), and Claude 3.5 Sonnet ( = 0.004) for all CT Contrast media information questions. : Even though all LLMs performed well overall and showed great potential for patient education, each model occasionally displayed potentially misleading information, highlighting the clinical application risk.

摘要

本研究比较了最先进的大语言模型(LLMs)对CT和MRI成像前患者问题的回答准确性。我们旨在证明大语言模型在提高工作流程效率方面的潜力,同时也强调错误信息等风险。向ChatGPT-4o、Claude 3.5 Sonnet、谷歌Gemini和米斯特拉尔大模型2展示了57个与CT相关和64个与MRI相关的患者问题。每个答案由两名获得委员会认证的放射科医生进行评估,并使用5点李克特量表对准确性/正确性/误导可能性进行评分。统计数据比较了各问题类别中大语言模型的表现。ChatGPT-4o在与CT相关的问题上获得了最高平均分,在与MRI相关的问题上与Claude 3.5 Sonnet并列,在所有模型中MRI的得分更高(ChatGPT-4o:CT[4.52(±0.46)],MRI:[4.79(±0.37)];谷歌Gemini:CT[4.44(±0.58)];MRI[4.68(±0.58)];Claude 3.5 Sonnet:CT[4.40(±0.59)];MRI[4.79(±0.37)];米斯特拉尔大模型2:CT[4.25(±0.54)];MRI[4.74(±0.47)])。每个大语言模型至少有一个回答被评为不准确,谷歌Gemini回答最常具有潜在误导性(CT为5.26%,MRI为2.34%)。在所有与CT相关的问题上,米斯特拉尔大模型2的表现均不如ChatGPT-4o(<0.001),在所有CT造影剂信息问题上,米斯特拉尔大模型2不如ChatGPT-4o(=0.003)、谷歌Gemini(=0.022)和Claude 3.5 Sonnet(=0.004)。尽管所有大语言模型总体表现良好,在患者教育方面显示出巨大潜力,但每个模型偶尔都会显示出潜在的误导性信息,突出了临床应用风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/d99ed6a0e756/jpm-15-00235-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/005b4a0fb05b/jpm-15-00235-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/d99ed6a0e756/jpm-15-00235-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/005b4a0fb05b/jpm-15-00235-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a03/12194482/d99ed6a0e756/jpm-15-00235-g002.jpg

相似文献

[1]
Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations.

J Pers Med. 2025-6-5

[2]
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.

J Med Internet Res. 2025-6-4

[3]
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.

Diagn Interv Radiol. 2025-3-3

[4]
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.

Dent J (Basel). 2025-6-18

[5]
Evaluation of Vision-Language Models for Detection and Deidentification of Medical Images with Burned-In Protected Health Information.

Radiology. 2025-6

[6]
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.

J Med Internet Res. 2025-6-18

[7]
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.

JMIR Perioper Med. 2025-6-12

[8]
Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.

Eur Radiol Exp. 2025-6-19

[9]
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025-5-20

[10]
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.

J Med Internet Res. 2025-6-4

本文引用的文献

[1]
Assessing the performance of AI chatbots in answering patients' common questions about low back pain.

Ann Rheum Dis. 2025-1

[2]
Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.

Ophthalmol Ther. 2025-1

[3]
Probing clarity: AI-generated simplified breast imaging reports for enhanced patient comprehension powered by ChatGPT-4o.

Eur Radiol Exp. 2024-10-30

[4]
Accuracy of ChatGPT responses on tracheotomy for patient education.

Eur Arch Otorhinolaryngol. 2024-11

[5]
"Doctor ChatGPT, Can You Help Me?" The Patient's Perspective: Cross-Sectional Study.

J Med Internet Res. 2024-10-1

[6]
Preoperative Patient Guidance and Education in Aesthetic Breast Plastic Surgery: A Novel Proposed Application of Artificial Intelligence Large Language Models.

Aesthet Surg J Open Forum. 2024-8-13

[7]
Evaluating the effectiveness of large language models in patient education for conjunctivitis.

Br J Ophthalmol. 2025-1-28

[8]
Comparing ChatGPT and a Single Anesthesiologist's Responses to Common Patient Questions: An Exploratory Cross-Sectional Survey of a Panel of Anesthesiologists.

J Med Syst. 2024-8-22

[9]
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.

J Med Internet Res. 2024-6-14

[10]
Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting.

Sci Rep. 2024-6-8

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索