Berend Kenrick, Duits Ashley, Gans Reinold O B
Department of Medicine, Curaçao Medical Center, Willemstad, Curaçao.
Institute for Medical Education, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
BMC Med Educ. 2025 May 22;25(1):751. doi: 10.1186/s12909-025-07235-2.
In clinical medicine, the assessment of hyponatremia is frequently required but also known as a source of major diagnostic errors, substantial mismanagement, and iatrogenic morbidity. Because artificial intelligence techniques are efficient in analyzing complex problems, their use may possibly overcome current assessment limitations. There is no literature concerning Chat Generative Pre-trained Transformer (ChatGPT-3.5) use for evaluating difficult hyponatremia cases. Because of the interesting pathophysiology, hyponatremia cases are often used in medical education for students to evaluate patients with students increasingly using artificial intelligence as a diagnostic tool. To evaluate this possibility, four challenging hyponatremia cases published previously, were presented to the free ChatGPT-3.5 for diagnosis and treatment suggestions.
We used four challenging hyponatremia cases, that were evaluated by 46 physicians in Canada, the Netherlands, South-Africa, Taiwan, and USA, and published previously. These four cases were presented two times in the free ChatGPT, version 3.5 in December 2023 as well as in September 2024 with the request to recommend diagnosis and therapy. Responses by ChatGPT were compared with those of the clinicians.
Case 1 and 3 have a single cause of hyponatremia. Case 2 and 4 have two contributing hyponatremia features. Neither ChatGPT, in 2023, nor the previously published assessment by 46 clinicians, whose assessment was described in the original publication, recognized the most crucial cause of hyponatremia with major therapeutic consequences in all four cases. In 2024 ChatGPT properly diagnosed and suggested adequate management in one case. Concurrent Addison's disease was correctly recognized in case 1 by ChatGPT in 2023 and 2024, whereas 81% of the clinicians missed this diagnosis. No proper therapeutic recommendations were given by ChatGPT in 2023 in any of the four cases, but in one case adequate advice was given by ChatGPT in 2024. The 46 clinicians recommended inadequate therapy in 65%, 57%, 2%, and 76%, respectively in case 1 to 4.
Our study currently does not support the use of the free version ChatGPT 3.5 in difficult hyponatremia cases, but a small improvement was observed after ten months with the same ChatGPT 3.5 version. Patients, health professionals, medical educators and students should be aware of the shortcomings of diagnosis and therapy suggestions by ChatGPT.
在临床医学中,低钠血症的评估经常需要进行,但它也是主要诊断错误、严重管理不善和医源性发病的根源。由于人工智能技术在分析复杂问题方面效率很高,其应用可能会克服当前评估的局限性。目前尚无关于使用聊天生成预训练变换器(ChatGPT - 3.5)评估疑难低钠血症病例的文献。鉴于有趣的病理生理学特点,低钠血症病例常用于医学教育中,供学生评估患者,且学生越来越多地将人工智能用作诊断工具。为评估这种可能性,我们将之前发表的4例具有挑战性的低钠血症病例提交给免费的ChatGPT - 3.5,以获取诊断和治疗建议。
我们使用了4例具有挑战性的低钠血症病例,这些病例曾由加拿大、荷兰、南非、中国台湾和美国的46名医生进行评估,并于之前发表。这4例病例于2023年12月以及2024年9月分两次提交给免费的ChatGPT 3.5版本,要求其给出诊断和治疗建议。将ChatGPT的回复与临床医生的回复进行比较。
病例1和病例3的低钠血症有单一病因。病例2和病例4有两个导致低钠血症的因素。2023年的ChatGPT以及最初发表的对46名临床医生评估(原始出版物中有描述)均未识别出所有4例病例中具有重大治疗后果的最关键低钠血症病因。2024年,ChatGPT正确诊断并给出适当管理建议的有1例。ChatGPT在2023年和2024年都正确识别出病例1并发艾迪生病,而81%的临床医生漏诊了该诊断。2023年ChatGPT在4例病例中均未给出恰当的治疗建议,但2024年在1例病例中给出了适当建议。在病例1至病例4中,46名临床医生分别有65%、57%、2%和76%推荐了不恰当的治疗方法。
我们的研究目前不支持在疑难低钠血症病例中使用免费版ChatGPT 3.5,但在使用同一ChatGPT 3.5版本十个月后观察到有小幅改进。患者、卫生专业人员、医学教育工作者和学生应意识到ChatGPT给出的诊断和治疗建议存在的不足。