• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT-4o在乳腺癌筛查中的评估:来自第5版BI-RADS乳腺影像报告和数据系统图谱及美国放射学会指南的见解

Evaluation of ChatGPT-4o in Breast Cancer Screening: Insights from the 5th Edition BI-RADS Atlas and ACR Guidelines.

作者信息

Özer Bilgen Mehpare, Korkmaz Eda Nur

机构信息

Department of Radiology, Sincan Training and Research Hospital, Ankara, Turkey.

出版信息

J Imaging Inform Med. 2025 Sep 12. doi: 10.1007/s10278-025-01663-8.

DOI:10.1007/s10278-025-01663-8
PMID:40940588
Abstract

The aim of this study is to evaluate the potential, reliability, and limitations of ChatGPT-4o in text-based questions and its effectiveness in clinical decision support processes based on the 5th edition of the BI-RADS Atlas and ACR breast cancer screening guidelines. In this study, a total of 100 questions-50 multiple-choice and 50 true/false-prepared by two radiologists were submitted to ChatGPT-4o between November 5 and 19. The answers provided by ChatGPT-4o were evaluated at baseline and 14 days later by both radiologists for accuracy and comprehensiveness using a Likert scale. Group comparisons were performed using Mann-Whitney U, Wilcoxon tests; response consistency was evaluated with Cohen's Kappa, and overall accuracy differences with a two-proportion z-test. The increase in overall accuracy from 86 to 95% was statistically significant according to the two-proportion z-test (p = .030). Comparisons between the two sessions revealed statistically significant increases in the accuracy (p = .013, r = .35, 95% CI [0.09, 0.61]) and comprehensiveness (p = .014, r = .35, 95% CI [0.09, 0.61]) rates of true/false questions. On the other hand, no significant difference was found between the accuracy (p = .180, r = .19, 95% CI [- 0.09, 0.47]) and comprehensiveness (p = .180, r = .19, 95% CI [- 0.09, 0.47]) rates of multiple-choice questions. In addition, group comparisons evaluating the effect of different question formats on performance revealed no significant difference in terms of accuracy (p = .661, r =  - 0.04, 95% CI [- 0.23, 0.16]) and comprehensiveness (p = .708, r =  - 0.04, 95% CI [- 0.23, 0.16]). The consistency of ChatGPT-4o responses was supported by Cohen's Kappa coefficients, all statistically significant (p < .001), with 95% confidence intervals ranging from - .038 to 1.084. ChatGPT-4o demonstrated remarkable performance in answering multiple-choice and true-false questions with overall accuracy improving from 86% in the first test to 95% after 14 days. ChatGPT-4o holds significant potential as a clinical decision support tool for healthcare professionals.

摘要

本研究旨在评估ChatGPT-4o在基于文本的问题中的潜力、可靠性和局限性,以及其在基于第5版BI-RADS图谱和美国放射学会(ACR)乳腺癌筛查指南的临床决策支持过程中的有效性。在本研究中,11月5日至19日期间,两名放射科医生准备的总共100个问题(50个多项选择题和50个是非题)被提交给ChatGPT-4o。ChatGPT-4o提供的答案在基线时以及14天后由两位放射科医生使用李克特量表评估其准确性和全面性。使用曼-惠特尼U检验、威尔科克森检验进行组间比较;使用科恩卡方系数评估回答的一致性,使用双比例z检验评估总体准确性差异。根据双比例z检验,总体准确性从86%提高到95%具有统计学意义(p = 0.030)。两次评估之间的比较显示,是非题的准确性(p = 0.013,r = 0.35,95%可信区间[0.09, 0.61])和全面性(p = 0.014,r = 0.35,95%可信区间[0.09, 0.61])率有统计学意义的提高。另一方面,多项选择题的准确性(p = 0.180,r = 0.19,95%可信区间[-0.09, 0.47])和全面性(p = 0.180,r = 0.19,95%可信区间[-0.09, 0.47])率之间没有显著差异。此外,评估不同问题格式对表现影响的组间比较显示,在准确性(p = 0.661,r = -0.04,95%可信区间[-0.23, 0.16])和全面性(p = 0.708,r = -0.04,95%可信区间[-0.23, 0.16])方面没有显著差异。ChatGPT-4o回答的一致性得到了科恩卡方系数的支持,所有系数均具有统计学意义(p < 0.001),95%置信区间为-0.038至1.084。ChatGPT-4o在回答多项选择题和是非题方面表现出色,总体准确性从第一次测试的86%提高到14天后的95%。ChatGPT-4o作为医疗专业人员的临床决策支持工具具有巨大潜力。

相似文献

1
Evaluation of ChatGPT-4o in Breast Cancer Screening: Insights from the 5th Edition BI-RADS Atlas and ACR Guidelines.ChatGPT-4o在乳腺癌筛查中的评估:来自第5版BI-RADS乳腺影像报告和数据系统图谱及美国放射学会指南的见解
J Imaging Inform Med. 2025 Sep 12. doi: 10.1007/s10278-025-01663-8.
2
Interpreting BI-RADS-Free Breast MRI Reports Using a Large Language Model: Automated BI-RADS Classification From Narrative Reports Using ChatGPT.使用大语言模型解读无BI-RADS的乳腺MRI报告:利用ChatGPT从叙述性报告中进行自动BI-RADS分类
Acad Radiol. 2025 Sep 6. doi: 10.1016/j.acra.2025.08.026.
3
Evaluating ChatGPT's Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search.评估ChatGPT在系统性红斑狼疮生物治疗中的效用:ChatGPT与谷歌网络搜索的比较研究
JMIR Form Res. 2025 Aug 28;9:e76458. doi: 10.2196/76458.
4
Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人:可靠的生命线还是风险?
Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.
5
ChatGPT as an effective tool for quality evaluation of radiomics research.ChatGPT作为一种用于影像组学研究质量评估的有效工具。
Eur Radiol. 2025 Apr;35(4):2030-2042. doi: 10.1007/s00330-024-11122-7. Epub 2024 Oct 15.
6
Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.使用大语言模型进行乳腺影像报告和数据系统分类及恶性肿瘤预测以增强乳腺超声诊断:回顾性研究
JMIR Med Inform. 2025 Jun 11;13:e70924. doi: 10.2196/70924.
7
Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency.五个解决欧洲放射学文凭(EDiR)基于文本问题的先进聊天机器人:性能和一致性的差异。
Eur Radiol Exp. 2025 Aug 19;9(1):79. doi: 10.1186/s41747-025-00591-0.
8
Evaluation of artificial ıntelligence use in ankylosing spondylitis with ChatGPT-4: patient and physician perspectives.使用ChatGPT-4评估人工智能在强直性脊柱炎中的应用:患者和医生的观点。
Clin Rheumatol. 2025 Sep 11. doi: 10.1007/s10067-025-07648-w.
9
GPT-4o and Specialized AI in Breast Ultrasound Imaging: A comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential.GPT-4o与乳腺超声成像中的专业人工智能:准确性、一致性、局限性及诊断潜力的比较研究
J Ultrasound Med. 2025 Jun 23. doi: 10.1002/jum.16749.
10
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.ChatGPT-4o与人类研究人员在为Cochrane系统评价撰写通俗易懂的总结方面的比较:一项双盲、随机非劣效性对照试验。
Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.

本文引用的文献

1
Revolutionizing Health Care: The Transformative Impact of Large Language Models in Medicine.变革医疗保健:大语言模型在医学领域的变革性影响。
J Med Internet Res. 2025 Jan 7;27:e59069. doi: 10.2196/59069.
2
Assessment of ChatGPT's Compliance with ESC-Acute Coronary Syndrome Management Guidelines at 30-Day Intervals.以30天为间隔评估ChatGPT对欧洲心脏病学会急性冠状动脉综合征管理指南的遵循情况。
Life (Basel). 2024 Sep 27;14(10):1235. doi: 10.3390/life14101235.
3
Current applications and future potential of ChatGPT in radiology: A systematic review.
ChatGPT 在放射学中的当前应用和未来潜力:系统评价。
J Med Imaging Radiat Oncol. 2024 Apr;68(3):257-264. doi: 10.1111/1754-9485.13621. Epub 2024 Jan 19.
4
Utility of ChatGPT in Clinical Practice.ChatGPT 在临床实践中的应用。
J Med Internet Res. 2023 Jun 28;25:e48568. doi: 10.2196/48568.
5
Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT.ChatGPT提供的乳腺癌预防和筛查建议的适宜性。
Radiology. 2023 May;307(4):e230424. doi: 10.1148/radiol.230424. Epub 2023 Apr 4.
6
Global Increase in Breast Cancer Incidence: Risk Factors and Preventive Measures.全球乳腺癌发病率上升:风险因素和预防措施。
Biomed Res Int. 2022 Apr 18;2022:9605439. doi: 10.1155/2022/9605439. eCollection 2022.
7
Understanding breast cancer as a global health concern.理解乳腺癌作为一个全球健康问题。
Br J Radiol. 2022 Feb 1;95(1130):20211033. doi: 10.1259/bjr.20211033. Epub 2021 Dec 14.
8
Mammography screening: A major issue in medicine.乳腺 X 光筛查:医学中的一个重大问题。
Eur J Cancer. 2018 Feb;90:34-62. doi: 10.1016/j.ejca.2017.11.002. Epub 2017 Dec 20.