Borchert Robin J, Hickman Charlotte R, Pepys Jack, Sadler Timothy J
Department of Radiology, University of Cambridge, Cambridge, United Kingdom.
Department of Radiology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom.
JMIR Med Educ. 2023 Aug 7;9:e48978. doi: 10.2196/48978.
ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors.
We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. This examination is designed to assess attributes such as communication, teamwork, patient safety, prioritization skills, professionalism, and ethics.
All questions from the UK Foundation Programme Office's (UKFPO's) 2023 SJT practice examination were inputted into ChatGPT. For each question, ChatGPT's answers and rationales were recorded and assessed on the basis of the official UK Foundation Programme Office scoring template. Questions were categorized into domains of Good Medical Practice on the basis of the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by reviewers and assigned one or multiple domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated.
Overall, ChatGPT performed well, scoring 76% on the SJT but scoring full marks on only a few questions (9%), which may reflect possible flaws in ChatGPT's situational judgement or inconsistencies in the reasoning across questions (or both) in the examination itself. ChatGPT demonstrated consistent performance across the 4 outlined domains in Good Medical Practice for doctors.
Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardizing questions and providing consistent rationales for examinations assessing professionalism and ethics.
ChatGPT是一个大型语言模型,在医学、法律和商业领域的专业考试中表现出色。然而,目前尚不清楚ChatGPT在评估医生专业素养和情境判断能力的考试中表现如何。
我们评估了ChatGPT在情境判断测试(SJT)中的表现,这是英国所有医学专业最后一年学生都要参加的全国性考试。该考试旨在评估沟通、团队合作、患者安全、优先级排序技能、专业素养和道德等属性。
将英国基础项目办公室(UKFPO)2023年SJT实践考试的所有问题输入ChatGPT。对于每个问题,记录ChatGPT的答案和理由,并根据UKFPO官方评分模板进行评估。根据评分表中提供的理由所引用的领域,将问题归类为良好医疗实践的领域。没有明确领域联系的问题由评审人员筛选并分配一个或多个领域。评估了ChatGPT的整体表现及其在良好医疗实践各个领域的表现。
总体而言,ChatGPT表现良好,在SJT中得分为76%,但只有少数问题(9%)得满分,这可能反映了ChatGPT情境判断中可能存在的缺陷或考试中问题推理的不一致性(或两者皆有)。ChatGPT在医生良好医疗实践的4个概述领域中表现出一致的性能。
需要进一步研究以了解大型语言模型(如ChatGPT)在医学教育中的潜在应用,用于标准化问题并为评估专业素养和道德的考试提供一致的理由。