ChatGPT在情景判断测试中的表现——英国针对医生的基于专业困境的考试

Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom.

作者信息

Borchert Robin J, Hickman Charlotte R, Pepys Jack, Sadler Timothy J

机构信息

Department of Radiology, University of Cambridge, Cambridge, United Kingdom.

Department of Radiology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom.

出版信息

JMIR Med Educ. 2023 Aug 7;9:e48978. doi: 10.2196/48978.

DOI:10.2196/48978

PMID:37548997

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10442724/

Abstract

BACKGROUND

ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors.

OBJECTIVE

We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. This examination is designed to assess attributes such as communication, teamwork, patient safety, prioritization skills, professionalism, and ethics.

METHODS

All questions from the UK Foundation Programme Office's (UKFPO's) 2023 SJT practice examination were inputted into ChatGPT. For each question, ChatGPT's answers and rationales were recorded and assessed on the basis of the official UK Foundation Programme Office scoring template. Questions were categorized into domains of Good Medical Practice on the basis of the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by reviewers and assigned one or multiple domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated.

RESULTS

Overall, ChatGPT performed well, scoring 76% on the SJT but scoring full marks on only a few questions (9%), which may reflect possible flaws in ChatGPT's situational judgement or inconsistencies in the reasoning across questions (or both) in the examination itself. ChatGPT demonstrated consistent performance across the 4 outlined domains in Good Medical Practice for doctors.

CONCLUSIONS

Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardizing questions and providing consistent rationales for examinations assessing professionalism and ethics.

摘要

背景

ChatGPT是一个大型语言模型，在医学、法律和商业领域的专业考试中表现出色。然而，目前尚不清楚ChatGPT在评估医生专业素养和情境判断能力的考试中表现如何。

目的

我们评估了ChatGPT在情境判断测试（SJT）中的表现，这是英国所有医学专业最后一年学生都要参加的全国性考试。该考试旨在评估沟通、团队合作、患者安全、优先级排序技能、专业素养和道德等属性。

方法

将英国基础项目办公室（UKFPO）2023年SJT实践考试的所有问题输入ChatGPT。对于每个问题，记录ChatGPT的答案和理由，并根据UKFPO官方评分模板进行评估。根据评分表中提供的理由所引用的领域，将问题归类为良好医疗实践的领域。没有明确领域联系的问题由评审人员筛选并分配一个或多个领域。评估了ChatGPT的整体表现及其在良好医疗实践各个领域的表现。

结果

总体而言，ChatGPT表现良好，在SJT中得分为76%，但只有少数问题（9%）得满分，这可能反映了ChatGPT情境判断中可能存在的缺陷或考试中问题推理的不一致性（或两者皆有）。ChatGPT在医生良好医疗实践的4个概述领域中表现出一致的性能。

结论

需要进一步研究以了解大型语言模型（如ChatGPT）在医学教育中的潜在应用，用于标准化问题并为评估专业素养和道德的考试提供一致的理由。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a788/10442724/081a7c550493/mededu_v9i1e48978_fig1.jpg

相似文献

Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom.ChatGPT在情景判断测试中的表现——英国针对医生的基于专业困境的考试

JMIR Med Educ. 2023 Aug 7;9:e48978. doi: 10.2196/48978.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试（USMLE）中的表现如何？大语言模型对医学教育和知识评估的影响。

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现：调查研究。

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响：来自台湾护理执照考试的见解。

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

Is ChatGPT's Knowledge and Interpretative Ability Comparable to First Professional MBBS (Bachelor of Medicine, Bachelor of Surgery) Students of India in Taking a Medical Biochemistry Examination?在参加医学生物化学考试方面，ChatGPT的知识和解释能力能与印度首批医学学士（医学学士、外科学士）专业学生相媲美吗？

Cureus. 2023 Oct 19;15(10):e47329. doi: 10.7759/cureus.47329. eCollection 2023 Oct.

Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations.航海七海：ChatGPT 在医学执照考试中的表现的跨国比较。

Ann Biomed Eng. 2024 Jun;52(6):1542-1545. doi: 10.1007/s10439-023-03338-3. Epub 2023 Aug 8.

Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination.评估 ChatGPT 在医学教育中的能力：与三年级医学生在肺病学考试中的比较分析。

JMIR Med Educ. 2024 Jul 23;10:e52818. doi: 10.2196/52818.

Performance of ChatGPT on free-response, clinical reasoning exams.ChatGPT在自由回答式临床推理考试中的表现。

medRxiv. 2023 Mar 29:2023.03.24.23287731. doi: 10.1101/2023.03.24.23287731.

Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.纯粹的智慧还是虚假的村庄？对 USMLE Step 3 题型的 ChatGPT 3.5 和 ChatGPT 4 的比较：定量分析。

JMIR Med Educ. 2024 Jan 5;10:e51148. doi: 10.2196/51148.

引用本文的文献

Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination.评估GPT-3.5、GPT-4和GPT-4o在中国国家医师资格考试中的表现。

Sci Rep. 2025 Apr 23;15(1):14119. doi: 10.1038/s41598-025-98949-2.

ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.医学教育中的ChatGPT及其他大语言模型——文献综述

Med Sci Educ. 2024 Nov 13;35(1):555-567. doi: 10.1007/s40670-024-02206-6. eCollection 2025 Feb.

Evolution of Artificial Intelligence in Medical Education From 2000 to 2024: Bibliometric Analysis.2000年至2024年医学教育中人工智能的发展：文献计量分析

Interact J Med Res. 2025 Jan 30;14:e63775. doi: 10.2196/63775.

Evaluating factors that impact scoring an open response situational judgment test: a mixed methods approach.评估影响开放式情境判断测试评分的因素：一种混合方法研究

Front Med (Lausanne). 2025 Jan 6;11:1525156. doi: 10.3389/fmed.2024.1525156. eCollection 2024.

The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis.人工智能解决方案在医疗检查和证书中的准确性和能力：系统评价和荟萃分析。

J Med Internet Res. 2024 Nov 5;26:e56532. doi: 10.2196/56532.

Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial.将 ChatGPT 融入骨科医学本科生教育：随机对照试验。

J Med Internet Res. 2024 Aug 20;26:e57037. doi: 10.2196/57037.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Evaluating ChatGPT's moral competence in health care-related ethical problems.评估ChatGPT在医疗保健相关伦理问题中的道德能力。

JAMIA Open. 2024 Jul 9;7(3):ooae065. doi: 10.1093/jamiaopen/ooae065. eCollection 2024 Oct.

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review.一份用于规范基于生成式人工智能模型的医疗保健教育与实践研究设计和报告的初步清单（METRICS）：涉及文献综述的开发研究

Interact J Med Res. 2024 Feb 15;13:e54704. doi: 10.2196/54704.

Medical students' patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study.韩国医学专业学生在领导力与沟通课程中使用 ChatGPT 作为反馈工具的模式及对 ChatGPT 的认知：一项横断面研究

J Educ Eval Health Prof. 2023;20:29. doi: 10.3352/jeehp.2023.20.29. Epub 2023 Nov 10.

本文引用的文献

Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination.评估人工智能在医学专科领域的局限性：ChatGPT在英国神经学专科证书考试中的表现。

BMJ Neurol Open. 2023 Jun 15;5(1):e000451. doi: 10.1136/bmjno-2023-000451. eCollection 2023.

Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations.ChatGPT在英国标准化入学考试中的表现：来自生物医学入学考试、大学数学入学测试、全国法律入学考试和思维技能评估考试的见解

JMIR Med Educ. 2023 Apr 26;9:e47737. doi: 10.2196/47737.

Trialling a Large Language Model (ChatGPT) in General Practice With the Applied Knowledge Test: Observational Study Demonstrating Opportunities and Limitations in Primary Care.在全科医疗中使用应用知识测试对大型语言模型（ChatGPT）进行试验：观察性研究揭示初级保健中的机遇与局限

JMIR Med Educ. 2023 Apr 21;9:e46599. doi: 10.2196/46599.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

ChatGPT passing USMLE shines a spotlight on the flaws of medical education.ChatGPT 通过美国医师执照考试凸显了医学教育的缺陷。

PLOS Digit Health. 2023 Feb 9;2(2):e0000205. doi: 10.1371/journal.pdig.0000205. eCollection 2023 Feb.

How appropriate is the situational judgment test in assessing future foundation doctors?情境判断测试在评估未来的住院医师方面有多合适？

BMJ. 2023 Jan 13;380:101. doi: 10.1136/bmj.p101.

Situational judgment tests: Who knows the right answers?情境判断测试：谁知道正确答案？

Med Teach. 2017 Dec;39(12):1293-1294. doi: 10.1080/0142159X.2017.1367766. Epub 2017 Aug 24.

Medical students' perceptions of the situational judgement test: a mixed methods study.医学生对情境判断测试的看法：一项混合方法研究。

Br J Hosp Med (Lond). 2015 Apr;76(4):234-8. doi: 10.12968/hmed.2015.76.4.234.

A situational judgement test of professional behaviour: development and validation.专业行为的情境判断测试：开发与验证

Med Teach. 2008 Jun;30(5):528-33. doi: 10.1080/01421590801952994.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ChatGPT在情景判断测试中的表现——英国针对医生的基于专业困境的考试

Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献