评估 ChatGPT 作为医学学习者和临床医生的诊断工具。

Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians.

机构信息

Department of Paediatrics, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.

Division of Nephrology, Children's Hospital, London Health Sciences Centre, London, Ontario, Canada.

出版信息

PLoS One. 2024 Jul 31;19(7):e0307383. doi: 10.1371/journal.pone.0307383. eCollection 2024.

DOI:10.1371/journal.pone.0307383

PMID:39083523

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11290643/

Abstract

BACKGROUND

ChatGPT is a large language model (LLM) trained on over 400 billion words from books, articles, and websites. Its extensive training draws from a large database of information, making it valuable as a diagnostic aid. Moreover, its capacity to comprehend and generate human language allows medical trainees to interact with it, enhancing its appeal as an educational resource. This study aims to investigate ChatGPT's diagnostic accuracy and utility in medical education.

METHODS

150 Medscape case challenges (September 2021 to January 2023) were inputted into ChatGPT. The primary outcome was the number (%) of cases for which the answer given was correct. Secondary outcomes included diagnostic accuracy, cognitive load, and quality of medical information. A qualitative content analysis was also conducted to assess its responses.

RESULTS

ChatGPT answered 49% (74/150) cases correctly. It had an overall accuracy of 74%, a precision of 48.67%, sensitivity of 48.67%, specificity of 82.89%, and an AUC of 0.66. Most answers were considered low cognitive load 51% (77/150) and most answers were complete and relevant 52% (78/150).

DISCUSSION

ChatGPT in its current form is not accurate as a diagnostic tool. ChatGPT does not necessarily give factual correctness, despite the vast amount of information it was trained on. Based on our qualitative analysis, ChatGPT struggles with the interpretation of laboratory values, imaging results, and may overlook key information relevant to the diagnosis. However, it still offers utility as an educational tool. ChatGPT was generally correct in ruling out a specific differential diagnosis and providing reasonable next diagnostic steps. Additionally, answers were easy to understand, showcasing a potential benefit in simplifying complex concepts for medical learners. Our results should guide future research into harnessing ChatGPT's potential educational benefits, such as simplifying medical concepts and offering guidance on differential diagnoses and next steps.

摘要

背景

ChatGPT 是一个大型语言模型（LLM），经过了超过 4000 亿个单词的训练，这些单词来自书籍、文章和网站。它的广泛训练利用了大量的信息数据库，使其成为一种有价值的诊断辅助工具。此外，它理解和生成人类语言的能力使医学实习生能够与它互动，增强了它作为教育资源的吸引力。本研究旨在探讨 ChatGPT 在医学教育中的诊断准确性和实用性。

方法

将 150 个 Medscape 病例挑战（2021 年 9 月至 2023 年 1 月）输入 ChatGPT。主要结果是回答正确的病例数量（%）。次要结果包括诊断准确性、认知负荷和医学信息质量。还进行了定性内容分析，以评估其回答。

结果

ChatGPT 正确回答了 49%（74/150）的病例。它的总体准确率为 74%，精度为 48.67%，灵敏度为 48.67%，特异性为 82.89%，AUC 为 0.66。大多数回答被认为是低认知负荷（51%，77/150），大多数回答是完整和相关的（52%，78/150）。

讨论

目前形式的 ChatGPT 作为诊断工具不够准确。尽管它经过了大量信息的训练，但 ChatGPT 并不一定能给出正确的事实。根据我们的定性分析，ChatGPT 在解释实验室值、成像结果方面存在困难，并且可能忽略与诊断相关的关键信息。然而，它仍然作为一种教育工具具有实用性。ChatGPT 在排除特定鉴别诊断和提供合理的下一步诊断步骤方面通常是正确的。此外，回答易于理解，为简化医学学习者的复杂概念展示了潜在的益处。我们的结果应该指导未来对利用 ChatGPT 的潜在教育益处的研究，例如简化医学概念和提供鉴别诊断和下一步指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfab/11290643/62438f0134b2/pone.0307383.g001.jpg

相似文献

Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians.

PLoS One. 2024 Jul 31;19(7):e0307383. doi: 10.1371/journal.pone.0307383. eCollection 2024.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.

JMIR Med Educ. 2023 Dec 5;9:e49183. doi: 10.2196/49183.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Can ChatGPT generate practice question explanations for medical students, a new faculty teaching tool?

Med Teach. 2025 Mar;47(3):560-564. doi: 10.1080/0142159X.2024.2363486. Epub 2024 Jun 20.

ChatGPT in medical school: how successful is AI in progress testing?

Med Educ Online. 2023 Dec;28(1):2220920. doi: 10.1080/10872981.2023.2220920.

Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports.

Sci Rep. 2024 Apr 23;14(1):9330. doi: 10.1038/s41598-024-58760-x.

Incorporating ChatGPT in Medical Informatics Education: Mixed Methods Study on Student Perceptions and Experiential Integration Proposals.

JMIR Med Educ. 2024 Mar 20;10:e51151. doi: 10.2196/51151.

Evaluation of ChatGPT's responses to information needs and information seeking of dementia patients.

Sci Rep. 2024 May 4;14(1):10273. doi: 10.1038/s41598-024-61068-5.

引用本文的文献

Evaluation of large language models as a diagnostic tool for medical learners and clinicians using advanced prompting techniques.

PLoS One. 2025 Aug 1;20(8):e0325803. doi: 10.1371/journal.pone.0325803. eCollection 2025.

Could a New Method of Acromiohumeral Distance Measurement Emerge? Artificial Intelligence vs. Physician.

J Imaging Inform Med. 2025 Jul 25. doi: 10.1007/s10278-025-01614-3.

Performance of AI Chatbots in Preliminary Diagnosis of Maxillofacial Pathologies.

Med Sci Monit. 2025 Jul 9;31:e949076. doi: 10.12659/MSM.949076.

CareAssist GPT improves patient user experience with a patient centered approach to computer aided diagnosis.

Sci Rep. 2025 Jul 2;15(1):22727. doi: 10.1038/s41598-025-01518-w.

Computerized diagnostic decision support systems-Isabel Pro versus ChatGPT-4 part II.

JAMIA Open. 2025 Jun 16;8(3):ooaf048. doi: 10.1093/jamiaopen/ooaf048. eCollection 2025 Jun.

A large language model improves clinicians' diagnostic performance in complex critical illness cases.

Crit Care. 2025 Jun 6;29(1):230. doi: 10.1186/s13054-025-05468-7.

Challenging cases of hyponatremia incorrectly interpreted by ChatGPT.

BMC Med Educ. 2025 May 22;25(1):751. doi: 10.1186/s12909-025-07235-2.

Utilizing large language models for gastroenterology research: a conceptual framework.

Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.

Identifying healthcare needs with patient experience reviews using ChatGPT.

PLoS One. 2025 Mar 18;20(3):e0313442. doi: 10.1371/journal.pone.0313442. eCollection 2025.

Artificial intelligence, medications, pharmacogenomics, and ethics.

Pharmacogenomics. 2024;25(14-15):611-622. doi: 10.1080/14622416.2024.2428587. Epub 2024 Nov 15.

本文引用的文献

ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations.

Narra J. 2023 Apr;3(1):e103. doi: 10.52225/narra.v3i1.103. Epub 2023 Mar 29.

Exploring the future of nursing: Insights from the ChatGPT model.

Belitung Nurs J. 2023 Feb 12;9(1):1-5. doi: 10.33546/bnj.2551. eCollection 2023.

Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.

J Am Coll Radiol. 2023 Oct;20(10):990-997. doi: 10.1016/j.jacr.2023.05.003. Epub 2023 Jun 21.

Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.

Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.

ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.

Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.

ChatGPT - Reshaping medical education and clinical management.

Pak J Med Sci. 2023 Mar-Apr;39(2):605-607. doi: 10.12669/pjms.39.2.7653.

Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?

Eur J Nucl Med Mol Imaging. 2023 May;50(6):1549-1552. doi: 10.1007/s00259-023-06172-w. Epub 2023 Mar 9.

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios.

J Med Syst. 2023 Mar 4;47(1):33. doi: 10.1007/s10916-023-01925-4.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.

Cureus. 2023 Feb 19;15(2):e35179. doi: 10.7759/cureus.35179. eCollection 2023 Feb.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估 ChatGPT 作为医学学习者和临床医生的诊断工具。

Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

DISCUSSION

背景

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献