Suppr超能文献

ChatGPT 准备好正式登场了吗?人工智能在模拟加拿大泌尿外科委员会考试中的表现。

Is ChatGPT ready for primetime? Performance of artificial intelligence on a simulated Canadian urology board exam.

作者信息

Touma Naji J, Caterini Jessica, Liblk Kiera

机构信息

Queen's University, Kingston, ON, Canada.

出版信息

Can Urol Assoc J. 2024 Oct;18(10):329-332. doi: 10.5489/cuaj.8800.

Abstract

INTRODUCTION

Generative artificial intelligence (AI) has proven to be a powerful tool with increasing applications in clinical care and medical education. ChatGPT has performed adequately on many specialty certification and knowledge assessment exams. The objective of this study was to assess the performance of ChatGPT 4 on a multiple-choice exam meant to simulate the Canadian urology board exam.

METHODS

Graduating urology residents representing all Canadian training programs gather yearly for a mock exam that simulates their upcoming board-certifying exam. The exam consists of written multiple-choice questions (MCQs) and an oral objective structured clinical examination (OSCE). The 2022 exam was taken by 29 graduating residents and was administered to ChatGPT 4.

RESULTS

ChatGPT 4 scored 46% on the MCQ exam, whereas the mean and median scores of graduating urology residents were 62.6%, and 62.7%, respectively. This would place ChatGPT's score 1.8 standard deviations from the median. The percentile rank of ChatGPT would be in the sixth percentile. ChatGPT scores on different topics of the exam were as follows: oncology 35%, andrology/benign prostatic hyperplasia 62%, physiology/anatomy 67%, incontinence/female urology 23%, infections 71%, urolithiasis 57%, and trauma/reconstruction 17%, with ChatGPT 4's oncology performance being significantly below that of postgraduate year 5 residents.

CONCLUSIONS

ChatGPT 4 underperforms on an MCQ exam meant to simulate the Canadian board exam. Ongoing assessments of the capability of generative AI is needed as these models evolve and are trained on additional urology content.

摘要

引言

生成式人工智能(AI)已被证明是一种强大的工具,在临床护理和医学教育中的应用越来越广泛。ChatGPT在许多专业认证和知识评估考试中表现良好。本研究的目的是评估ChatGPT 4在一场旨在模拟加拿大泌尿外科委员会考试的多项选择题考试中的表现。

方法

代表加拿大所有培训项目的即将毕业的泌尿外科住院医师每年都会参加一场模拟即将到来的委员会认证考试的模拟考试。该考试包括书面多项选择题(MCQ)和口头客观结构化临床考试(OSCE)。2022年的考试由29名即将毕业的住院医师参加,并让ChatGPT 4作答。

结果

ChatGPT 4在MCQ考试中的得分为46%,而即将毕业的泌尿外科住院医师的平均得分和中位数得分分别为62.6%和62.7%。这使得ChatGPT的分数比中位数低1.8个标准差。ChatGPT的百分位排名将处于第六百分位。ChatGPT在考试不同主题上的得分如下:肿瘤学35%,男科学/良性前列腺增生62%,生理学/解剖学67%,尿失禁/女性泌尿外科23%,感染71%,尿路结石57%,以及创伤/重建17%,ChatGPT 4在肿瘤学方面的表现明显低于五年级住院医师。

结论

ChatGPT 4在一场旨在模拟加拿大委员会考试的MCQ考试中表现不佳。随着这些模型的发展以及在更多泌尿外科内容上的训练,需要对生成式AI的能力进行持续评估。

相似文献

6
ChatGPT Earns American Board Certification in Hand Surgery.ChatGPT 获得美国手部外科委员会认证。
Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.

引用本文的文献

本文引用的文献

3
ChatGPT in healthcare: A taxonomy and systematic review.ChatGPT 在医疗保健中的应用:分类法与系统综述。
Comput Methods Programs Biomed. 2024 Mar;245:108013. doi: 10.1016/j.cmpb.2024.108013. Epub 2024 Jan 15.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验