• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估 ChatGPT 作为医学学习者和临床医生的诊断工具。

Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians.

机构信息

Department of Paediatrics, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.

Division of Nephrology, Children's Hospital, London Health Sciences Centre, London, Ontario, Canada.

出版信息

PLoS One. 2024 Jul 31;19(7):e0307383. doi: 10.1371/journal.pone.0307383. eCollection 2024.

DOI:10.1371/journal.pone.0307383
PMID:39083523
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11290643/
Abstract

BACKGROUND

ChatGPT is a large language model (LLM) trained on over 400 billion words from books, articles, and websites. Its extensive training draws from a large database of information, making it valuable as a diagnostic aid. Moreover, its capacity to comprehend and generate human language allows medical trainees to interact with it, enhancing its appeal as an educational resource. This study aims to investigate ChatGPT's diagnostic accuracy and utility in medical education.

METHODS

150 Medscape case challenges (September 2021 to January 2023) were inputted into ChatGPT. The primary outcome was the number (%) of cases for which the answer given was correct. Secondary outcomes included diagnostic accuracy, cognitive load, and quality of medical information. A qualitative content analysis was also conducted to assess its responses.

RESULTS

ChatGPT answered 49% (74/150) cases correctly. It had an overall accuracy of 74%, a precision of 48.67%, sensitivity of 48.67%, specificity of 82.89%, and an AUC of 0.66. Most answers were considered low cognitive load 51% (77/150) and most answers were complete and relevant 52% (78/150).

DISCUSSION

ChatGPT in its current form is not accurate as a diagnostic tool. ChatGPT does not necessarily give factual correctness, despite the vast amount of information it was trained on. Based on our qualitative analysis, ChatGPT struggles with the interpretation of laboratory values, imaging results, and may overlook key information relevant to the diagnosis. However, it still offers utility as an educational tool. ChatGPT was generally correct in ruling out a specific differential diagnosis and providing reasonable next diagnostic steps. Additionally, answers were easy to understand, showcasing a potential benefit in simplifying complex concepts for medical learners. Our results should guide future research into harnessing ChatGPT's potential educational benefits, such as simplifying medical concepts and offering guidance on differential diagnoses and next steps.

摘要

背景

ChatGPT 是一个大型语言模型(LLM),经过了超过 4000 亿个单词的训练,这些单词来自书籍、文章和网站。它的广泛训练利用了大量的信息数据库,使其成为一种有价值的诊断辅助工具。此外,它理解和生成人类语言的能力使医学实习生能够与它互动,增强了它作为教育资源的吸引力。本研究旨在探讨 ChatGPT 在医学教育中的诊断准确性和实用性。

方法

将 150 个 Medscape 病例挑战(2021 年 9 月至 2023 年 1 月)输入 ChatGPT。主要结果是回答正确的病例数量(%)。次要结果包括诊断准确性、认知负荷和医学信息质量。还进行了定性内容分析,以评估其回答。

结果

ChatGPT 正确回答了 49%(74/150)的病例。它的总体准确率为 74%,精度为 48.67%,灵敏度为 48.67%,特异性为 82.89%,AUC 为 0.66。大多数回答被认为是低认知负荷(51%,77/150),大多数回答是完整和相关的(52%,78/150)。

讨论

目前形式的 ChatGPT 作为诊断工具不够准确。尽管它经过了大量信息的训练,但 ChatGPT 并不一定能给出正确的事实。根据我们的定性分析,ChatGPT 在解释实验室值、成像结果方面存在困难,并且可能忽略与诊断相关的关键信息。然而,它仍然作为一种教育工具具有实用性。ChatGPT 在排除特定鉴别诊断和提供合理的下一步诊断步骤方面通常是正确的。此外,回答易于理解,为简化医学学习者的复杂概念展示了潜在的益处。我们的结果应该指导未来对利用 ChatGPT 的潜在教育益处的研究,例如简化医学概念和提供鉴别诊断和下一步指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/7b46170110ed/pone.0307383.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/62438f0134b2/pone.0307383.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/4e08d2f9e008/pone.0307383.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/3ff102d0a992/pone.0307383.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/a5567f5dbbd7/pone.0307383.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/7b46170110ed/pone.0307383.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/62438f0134b2/pone.0307383.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/4e08d2f9e008/pone.0307383.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/3ff102d0a992/pone.0307383.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/a5567f5dbbd7/pone.0307383.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/7b46170110ed/pone.0307383.g005.jpg

相似文献

1
Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians.评估 ChatGPT 作为医学学习者和临床医生的诊断工具。
PLoS One. 2024 Jul 31;19(7):e0307383. doi: 10.1371/journal.pone.0307383. eCollection 2024.
2
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。
Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.
3
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
4
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.ChatGPT与医学顾问的对比:对耳鼻喉科基于病例问题回答的盲法评估
JMIR Med Educ. 2023 Dec 5;9:e49183. doi: 10.2196/49183.
5
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
6
Can ChatGPT generate practice question explanations for medical students, a new faculty teaching tool?ChatGPT能否为医学生生成练习题解释,成为一种新的教师教学工具?
Med Teach. 2025 Mar;47(3):560-564. doi: 10.1080/0142159X.2024.2363486. Epub 2024 Jun 20.
7
ChatGPT in medical school: how successful is AI in progress testing?ChatGPT 在医学院:人工智能在进展测试中表现如何?
Med Educ Online. 2023 Dec;28(1):2220920. doi: 10.1080/10872981.2023.2220920.
8
Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports.评估ChatGPT 4.0在美国医师执照考试第二步临床知识考试(USMLE STEP 2 CK)及临床病例报告中的测试表现和临床诊断准确性。
Sci Rep. 2024 Apr 23;14(1):9330. doi: 10.1038/s41598-024-58760-x.
9
Incorporating ChatGPT in Medical Informatics Education: Mixed Methods Study on Student Perceptions and Experiential Integration Proposals.将 ChatGPT 融入医学信息学教育:学生认知和体验融入建议的混合方法研究。
JMIR Med Educ. 2024 Mar 20;10:e51151. doi: 10.2196/51151.
10
Evaluation of ChatGPT's responses to information needs and information seeking of dementia patients.评估 ChatGPT 对痴呆症患者信息需求和信息检索的响应。
Sci Rep. 2024 May 4;14(1):10273. doi: 10.1038/s41598-024-61068-5.

引用本文的文献

1
Evaluation of large language models as a diagnostic tool for medical learners and clinicians using advanced prompting techniques.使用先进提示技术评估大型语言模型作为医学学习者和临床医生的诊断工具。
PLoS One. 2025 Aug 1;20(8):e0325803. doi: 10.1371/journal.pone.0325803. eCollection 2025.
2
Could a New Method of Acromiohumeral Distance Measurement Emerge? Artificial Intelligence vs. Physician.能否出现一种新的肩峰肱骨距离测量方法?人工智能与医生的较量。
J Imaging Inform Med. 2025 Jul 25. doi: 10.1007/s10278-025-01614-3.
3
Performance of AI Chatbots in Preliminary Diagnosis of Maxillofacial Pathologies.

本文引用的文献

1
ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations.ChatGPT在医学、牙科、药学和公共卫生教育中的应用:一项突出优势与局限的描述性研究。
Narra J. 2023 Apr;3(1):e103. doi: 10.52225/narra.v3i1.103. Epub 2023 Mar 29.
2
Exploring the future of nursing: Insights from the ChatGPT model.探索护理的未来:来自ChatGPT模型的见解。
Belitung Nurs J. 2023 Feb 12;9(1):1-5. doi: 10.33546/bnj.2551. eCollection 2023.
3
Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.
人工智能聊天机器人在颌面疾病初步诊断中的表现。
Med Sci Monit. 2025 Jul 9;31:e949076. doi: 10.12659/MSM.949076.
4
CareAssist GPT improves patient user experience with a patient centered approach to computer aided diagnosis.CareAssist GPT通过以患者为中心的计算机辅助诊断方法改善患者的用户体验。
Sci Rep. 2025 Jul 2;15(1):22727. doi: 10.1038/s41598-025-01518-w.
5
Computerized diagnostic decision support systems-Isabel Pro versus ChatGPT-4 part II.计算机化诊断决策支持系统——伊莎贝尔专业版与ChatGPT-4 第二部分
JAMIA Open. 2025 Jun 16;8(3):ooaf048. doi: 10.1093/jamiaopen/ooaf048. eCollection 2025 Jun.
6
A large language model improves clinicians' diagnostic performance in complex critical illness cases.一个大语言模型提高了临床医生在复杂重症病例中的诊断表现。
Crit Care. 2025 Jun 6;29(1):230. doi: 10.1186/s13054-025-05468-7.
7
Challenging cases of hyponatremia incorrectly interpreted by ChatGPT.ChatGPT对低钠血症疑难病例的错误解读
BMC Med Educ. 2025 May 22;25(1):751. doi: 10.1186/s12909-025-07235-2.
8
Utilizing large language models for gastroenterology research: a conceptual framework.利用大语言模型进行胃肠病学研究:一个概念框架。
Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.
9
Identifying healthcare needs with patient experience reviews using ChatGPT.使用ChatGPT通过患者体验评估来确定医疗保健需求。
PLoS One. 2025 Mar 18;20(3):e0313442. doi: 10.1371/journal.pone.0313442. eCollection 2025.
10
Artificial intelligence, medications, pharmacogenomics, and ethics.人工智能、药物、药物基因组学与伦理学。
Pharmacogenomics. 2024;25(14-15):611-622. doi: 10.1080/14622416.2024.2428587. Epub 2024 Nov 15.
评估 GPT 作为放射学决策辅助工具:GPT-4 与 GPT-3.5 在乳腺成像试点中的比较。
J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.
4
Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.评估ChatGPT在眼科领域的表现:对其优缺点的分析。
Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.
5
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.ChatGPT在医学教育、研究与实践中的应用:对其前景与合理担忧的系统评价
Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.
6
ChatGPT - Reshaping medical education and clinical management.ChatGPT——重塑医学教育与临床管理。
Pak J Med Sci. 2023 Mar-Apr;39(2):605-607. doi: 10.12669/pjms.39.2.7653.
7
Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?大语言模型(LLM)和ChatGPT:对核医学将产生什么影响?
Eur J Nucl Med Mol Imaging. 2023 May;50(6):1549-1552. doi: 10.1007/s00259-023-06172-w. Epub 2023 Mar 9.
8
Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios.评估 ChatGPT 在医疗保健中的可行性:对多个临床和研究场景的分析。
J Med Syst. 2023 Mar 4;47(1):33. doi: 10.1007/s10916-023-01925-4.
9
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
10
Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.ChatGPT中的人工幻觉:对科学写作的影响
Cureus. 2023 Feb 19;15(2):e35179. doi: 10.7759/cureus.35179. eCollection 2023 Feb.