ChatGPT对低钠血症疑难病例的错误解读

Challenging cases of hyponatremia incorrectly interpreted by ChatGPT.

作者信息

Berend Kenrick, Duits Ashley, Gans Reinold O B

机构信息

Department of Medicine, Curaçao Medical Center, Willemstad, Curaçao.

Institute for Medical Education, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.

出版信息

BMC Med Educ. 2025 May 22;25(1):751. doi: 10.1186/s12909-025-07235-2.

DOI:10.1186/s12909-025-07235-2

PMID:40405178

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12100905/

Abstract

BACKGROUND

In clinical medicine, the assessment of hyponatremia is frequently required but also known as a source of major diagnostic errors, substantial mismanagement, and iatrogenic morbidity. Because artificial intelligence techniques are efficient in analyzing complex problems, their use may possibly overcome current assessment limitations. There is no literature concerning Chat Generative Pre-trained Transformer (ChatGPT-3.5) use for evaluating difficult hyponatremia cases. Because of the interesting pathophysiology, hyponatremia cases are often used in medical education for students to evaluate patients with students increasingly using artificial intelligence as a diagnostic tool. To evaluate this possibility, four challenging hyponatremia cases published previously, were presented to the free ChatGPT-3.5 for diagnosis and treatment suggestions.

METHODS

We used four challenging hyponatremia cases, that were evaluated by 46 physicians in Canada, the Netherlands, South-Africa, Taiwan, and USA, and published previously. These four cases were presented two times in the free ChatGPT, version 3.5 in December 2023 as well as in September 2024 with the request to recommend diagnosis and therapy. Responses by ChatGPT were compared with those of the clinicians.

RESULTS

Case 1 and 3 have a single cause of hyponatremia. Case 2 and 4 have two contributing hyponatremia features. Neither ChatGPT, in 2023, nor the previously published assessment by 46 clinicians, whose assessment was described in the original publication, recognized the most crucial cause of hyponatremia with major therapeutic consequences in all four cases. In 2024 ChatGPT properly diagnosed and suggested adequate management in one case. Concurrent Addison's disease was correctly recognized in case 1 by ChatGPT in 2023 and 2024, whereas 81% of the clinicians missed this diagnosis. No proper therapeutic recommendations were given by ChatGPT in 2023 in any of the four cases, but in one case adequate advice was given by ChatGPT in 2024. The 46 clinicians recommended inadequate therapy in 65%, 57%, 2%, and 76%, respectively in case 1 to 4.

CONCLUSION

Our study currently does not support the use of the free version ChatGPT 3.5 in difficult hyponatremia cases, but a small improvement was observed after ten months with the same ChatGPT 3.5 version. Patients, health professionals, medical educators and students should be aware of the shortcomings of diagnosis and therapy suggestions by ChatGPT.

摘要

背景

在临床医学中，低钠血症的评估经常需要进行，但它也是主要诊断错误、严重管理不善和医源性发病的根源。由于人工智能技术在分析复杂问题方面效率很高，其应用可能会克服当前评估的局限性。目前尚无关于使用聊天生成预训练变换器（ChatGPT - 3.5）评估疑难低钠血症病例的文献。鉴于有趣的病理生理学特点，低钠血症病例常用于医学教育中，供学生评估患者，且学生越来越多地将人工智能用作诊断工具。为评估这种可能性，我们将之前发表的4例具有挑战性的低钠血症病例提交给免费的ChatGPT - 3.5，以获取诊断和治疗建议。

方法

我们使用了4例具有挑战性的低钠血症病例，这些病例曾由加拿大、荷兰、南非、中国台湾和美国的46名医生进行评估，并于之前发表。这4例病例于2023年12月以及2024年9月分两次提交给免费的ChatGPT 3.5版本，要求其给出诊断和治疗建议。将ChatGPT的回复与临床医生的回复进行比较。

结果

病例1和病例3的低钠血症有单一病因。病例2和病例4有两个导致低钠血症的因素。2023年的ChatGPT以及最初发表的对46名临床医生评估（原始出版物中有描述）均未识别出所有4例病例中具有重大治疗后果的最关键低钠血症病因。2024年，ChatGPT正确诊断并给出适当管理建议的有1例。ChatGPT在2023年和2024年都正确识别出病例1并发艾迪生病，而81%的临床医生漏诊了该诊断。2023年ChatGPT在4例病例中均未给出恰当的治疗建议，但2024年在1例病例中给出了适当建议。在病例1至病例4中，46名临床医生分别有65%、57%、2%和76%推荐了不恰当的治疗方法。

结论

我们的研究目前不支持在疑难低钠血症病例中使用免费版ChatGPT 3.5，但在使用同一ChatGPT 3.5版本十个月后观察到有小幅改进。患者、卫生专业人员、医学教育工作者和学生应意识到ChatGPT给出的诊断和治疗建议存在的不足。

相似文献

Challenging cases of hyponatremia incorrectly interpreted by ChatGPT.

BMC Med Educ. 2025 May 22;25(1):751. doi: 10.1186/s12909-025-07235-2.

A retrospective evaluation of the potential of ChatGPT in the accurate diagnosis of acute stroke.

Diagn Interv Radiol. 2025 Apr 28;31(3):187-195. doi: 10.4274/dir.2024.242892. Epub 2024 Sep 2.

The ChatGPT effect and transforming nursing education with generative AI: Discussion paper.

Nurse Educ Pract. 2024 Feb;75:103888. doi: 10.1016/j.nepr.2024.103888. Epub 2024 Jan 10.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

Assessing Familiarity, Usage Patterns, and Attitudes of Medical Students Toward ChatGPT and Other Chat-Based AI Apps in Medical Education: Cross-Sectional Questionnaire Study.

JMIR Med Educ. 2025 Jan 30;11:e63065. doi: 10.2196/63065.

How do we teach generative artificial intelligence to medical educators? Pilot of a faculty development workshop using ChatGPT.

Med Teach. 2025 Jan;47(1):160-162. doi: 10.1080/0142159X.2024.2341806. Epub 2024 Apr 22.

AI-powered standardised patients: evaluating ChatGPT-4o's impact on clinical case management in intern physicians.

BMC Med Educ. 2025 Feb 20;25(1):278. doi: 10.1186/s12909-025-06877-6.

Navigating the future of pediatric cardiovascular surgery: Insights and innovation powered by Chat Generative Pre-Trained Transformer (ChatGPT).

J Thorac Cardiovasc Surg. 2025 Feb 1. doi: 10.1016/j.jtcvs.2025.01.022.

The impact of Chat Generative Pre-trained Transformer (ChatGPT) on medical education.

Postgrad Med J. 2023 Sep 21;99(1176):1125-1127. doi: 10.1093/postmj/qgad058.

Evaluating the accuracy of Chat Generative Pre-trained Transformer version 4 (ChatGPT-4) responses to United States Food and Drug Administration (FDA) frequently asked questions about dental amalgam.

BMC Oral Health. 2024 May 24;24(1):605. doi: 10.1186/s12903-024-04358-8.

本文引用的文献

The role of the clinical laboratory in diagnosing hyponatremia disorders.

Crit Rev Clin Lab Sci. 2025 Jun;62(4):240-265. doi: 10.1080/10408363.2025.2462814. Epub 2025 Mar 1.

Let's chat! Integrating ChatGPT in medical student assignments to enhance critical analysis.

Med Teach. 2025 May;47(5):791-793. doi: 10.1080/0142159X.2024.2421997. Epub 2024 Oct 31.

Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.

JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.

The Impact of AI Usage on University Students' Willingness for Autonomous Learning.

Behav Sci (Basel). 2024 Oct 16;14(10):956. doi: 10.3390/bs14100956.

Enhancing self-directed learning with custom GPT AI facilitation among medical students: A randomized controlled trial.

Med Teach. 2025 Jul;47(7):1126-1133. doi: 10.1080/0142159X.2024.2413023. Epub 2024 Oct 19.

Evaluation of ChatGPT as a diagnostic tool for medical learners and clinicians.

PLoS One. 2024 Jul 31;19(7):e0307383. doi: 10.1371/journal.pone.0307383. eCollection 2024.

Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports.

Sci Rep. 2024 Apr 23;14(1):9330. doi: 10.1038/s41598-024-58760-x.

Harnessing the open access version of ChatGPT for enhanced clinical opinions.

PLOS Digit Health. 2024 Feb 5;3(2):e0000355. doi: 10.1371/journal.pdig.0000355. eCollection 2024 Feb.

A machine learning approach for predicting treatment response of hyponatremia.

Endocr J. 2024 Apr 30;71(4):345-355. doi: 10.1507/endocrj.EJ23-0561. Epub 2024 Mar 2.

Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis.

JAMA Netw Open. 2023 Aug 1;6(8):e2325000. doi: 10.1001/jamanetworkopen.2023.25000.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT对低钠血症疑难病例的错误解读

Challenging cases of hyponatremia incorrectly interpreted by ChatGPT.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献