文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估ChatGPT-4在英国医学执照评估中的表现。

Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment.

作者信息

Lai U Hin, Wu Keng Sam, Hsu Ting-Yu, Kan Jessie Kai Ching

机构信息

Sandwell and West Birmingham NHS Trust, West Bromwich, United Kingdom.

Aston Medical School, Birmingham, United Kingdom.

出版信息

Front Med (Lausanne). 2023 Sep 19;10:1240915. doi: 10.3389/fmed.2023.1240915. eCollection 2023.


DOI:10.3389/fmed.2023.1240915
PMID:37795422
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10547055/
Abstract

INTRODUCTION: Recent developments in artificial intelligence large language models (LLMs), such as ChatGPT, have allowed for the understanding and generation of human-like text. Studies have found LLMs abilities to perform well in various examinations including law, business and medicine. This study aims to evaluate the performance of ChatGPT in the United Kingdom Medical Licensing Assessment (UKMLA). METHODS: Two publicly available UKMLA papers consisting of 200 single-best-answer (SBA) questions were screened. Nine SBAs were omitted as they contained images that were not suitable for input. Each question was assigned a specialty based on the UKMLA content map published by the General Medical Council. A total of 191 SBAs were inputted in ChatGPT-4 through three attempts over the course of 3 weeks (once per week). RESULTS: ChatGPT scored 74.9% (143/191), 78.0% (149/191) and 75.6% (145/191) on three attempts, respectively. The average of all three attempts was 76.3% (437/573) with a 95% confidence interval of (74.46% and 78.08%). ChatGPT answered 129 SBAs correctly and 32 SBAs incorrectly on all three attempts. On three attempts, ChatGPT performed well in mental health (8/9 SBAs), cancer (11/14 SBAs) and cardiovascular (10/13 SBAs). On three attempts, ChatGPT did not perform well in clinical haematology (3/7 SBAs), endocrine and metabolic (2/5 SBAs) and gastrointestinal including liver (3/10 SBAs). Regarding to response consistency, ChatGPT provided correct answers consistently in 67.5% (129/191) of SBAs but provided incorrect answers consistently in 12.6% (24/191) and inconsistent response in 19.9% (38/191) of SBAs, respectively. DISCUSSION AND CONCLUSION: This study suggests ChatGPT performs well in the UKMLA. There may be a potential correlation between specialty performance. LLMs ability to correctly answer SBAs suggests that it could be utilised as a supplementary learning tool in medical education with appropriate medical educator supervision.

摘要

引言:诸如ChatGPT之类的人工智能大语言模型(LLMs)的最新发展使得能够理解和生成类人文本。研究发现,大语言模型在包括法律、商业和医学在内的各种考试中表现出色。本研究旨在评估ChatGPT在英国医学许可评估(UKMLA)中的表现。 方法:筛选了两篇公开可用的包含200个最佳单项选择题(SBA)的UKMLA试卷。由于其中包含不适合输入的图像,9个SBA被排除。根据英国医学总会发布的UKMLA内容图谱,为每个问题指定一个专业领域。在3周内分三次(每周一次)将总共191个SBA输入ChatGPT-4。 结果:ChatGPT在三次尝试中的得分分别为74.9%(143/191)、78.0%(149/191)和75.6%(145/191)。三次尝试的平均得分是76.3%(437/573),95%置信区间为(74.46%和78.08%)。ChatGPT在所有三次尝试中正确回答了129个SBA,错误回答了32个SBA。在三次尝试中,ChatGPT在心理健康(8/9个SBA)、癌症(11/14个SBA)和心血管(10/13个SBA)方面表现良好。在三次尝试中,ChatGPT在临床血液学(3/7个SBA)、内分泌和代谢(2/5个SBA)以及包括肝脏在内的胃肠道(3/10个SBA)方面表现不佳。关于回答的一致性,ChatGPT在67.5%(129/191)的SBA中始终提供正确答案,但在12.6%(24/191)的SBA中始终提供错误答案,在19.9%(38/191)的SBA中回答不一致。 讨论与结论:本研究表明ChatGPT在UKMLA中表现良好。专业表现之间可能存在潜在关联。大语言模型正确回答SBA的能力表明,在适当的医学教育工作者监督下,它可以用作医学教育的辅助学习工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d986/10547055/d6dafe450394/fmed-10-1240915-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d986/10547055/d6dafe450394/fmed-10-1240915-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d986/10547055/d6dafe450394/fmed-10-1240915-g001.jpg

相似文献

[1]
Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment.

Front Med (Lausanne). 2023-9-19

[2]
Performance of Generative Artificial Intelligence in Dental Licensing Examinations.

Int Dent J. 2024-6

[3]
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024-7-25

[4]
Could ChatGPT Pass the UK Radiology Fellowship Examinations?

Acad Radiol. 2024-5

[5]
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

Int J Nurs Stud. 2024-5

[6]
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023-2-8

[7]
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.

JMIR Med Educ. 2023-9-28

[8]
ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination.

Med Teach. 2024-3

[9]
A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology-Head and Neck Surgery Certification Examinations: Performance Study.

JMIR Med Educ. 2024-1-16

[10]
The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland.

Clin Kidney J. 2024-6-22

引用本文的文献

[1]
Clinical applications of large language models in medicine and surgery: A scoping review.

J Int Med Res. 2025-7

[2]
Evaluation of ChatGPT Performance on Emergency Medicine Board Examination Questions: Observational Study.

JMIR AI. 2025-3-12

[3]
Chatbots' Role in Generating Single Best Answer Questions for Undergraduate Medical Student Assessment: Comparative Analysis.

JMIR Med Educ. 2025-5-30

[4]
Can ChatGPT-4o Really Pass Medical Science Exams? A Pragmatic Analysis Using Novel Questions.

Med Sci Educ. 2025-2-4

[5]
Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.

JMIR Cancer. 2025-4-16

[6]
Assessing ChatGPT 4.0's Capabilities in the United Kingdom Medical Licensing Examination (UKMLA): A Robust Categorical Analysis.

Sci Rep. 2025-4-15

[7]
Large Language Models in Biochemistry Education: Comparative Evaluation of Performance.

JMIR Med Educ. 2025-4-10

[8]
ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.

Med Sci Educ. 2024-11-13

[9]
Performance of ChatGPT-4 on Taiwanese Traditional Chinese Medicine Licensing Examinations: Cross-Sectional Study.

JMIR Med Educ. 2025-3-19

[10]
Transforming dental diagnostics with artificial intelligence: advanced integration of ChatGPT and large language models for patient care.

Front Dent Med. 2025-1-6

本文引用的文献

[1]
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.

JMIR Med Educ. 2024-2-21

[2]
ChatGPT Performs on the Chinese National Medical Licensing Examination.

J Med Syst. 2023-8-15

[3]
Practical Applications of ChatGPT in Undergraduate Medical Education.

J Med Educ Curric Dev. 2023-5-24

[4]
Artificial intelligence and anaesthesia examinations: exploring ChatGPT as a prelude to the future.

Br J Anaesth. 2023-8

[5]
Performance of ChatGPT on the pharmacist licensing examination in Taiwan.

J Chin Med Assoc. 2023-7-1

[6]
Chat Generative Pretrained Transformer Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test.

Am J Gastroenterol. 2023-12-1

[7]
Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations.

Radiology. 2023-6

[8]
Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams.

Eye (Lond). 2023-12

[9]
Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations.

JMIR Med Educ. 2023-4-26

[10]
ChatGPT - Reshaping medical education and clinical management.

Pak J Med Sci. 2023

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索