• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估 ChatGPT 在 MRCP 第 1 部分中的能力,并对其在研究生医学评估中的能力进行系统文献回顾。

Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments.

机构信息

Guy's Hospital, Guy's and St Thomas' NHS Foundation Trust, Great Maze Pond, London, United Kingdom.

Basel, Switzerland.

出版信息

PLoS One. 2024 Jul 31;19(7):e0307372. doi: 10.1371/journal.pone.0307372. eCollection 2024.

DOI:10.1371/journal.pone.0307372
PMID:39083455
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11290618/
Abstract

OBJECTIVES

As a large language model (LLM) trained on a large data set, ChatGPT can perform a wide array of tasks without additional training. We evaluated the performance of ChatGPT on postgraduate UK medical examinations through a systematic literature review of ChatGPT's performance in UK postgraduate medical assessments and its performance on Member of Royal College of Physicians (MRCP) Part 1 examination.

METHODS

Medline, Embase and Cochrane databases were searched. Articles discussing the performance of ChatGPT in UK postgraduate medical examinations were included in the systematic review. Information was extracted on exam performance including percentage scores and pass/fail rates. MRCP UK Part 1 sample paper questions were inserted into ChatGPT-3.5 and -4 four times each and the scores marked against the correct answers provided.

RESULTS

12 studies were ultimately included in the systematic literature review. ChatGPT-3.5 scored 66.4% and ChatGPT-4 scored 84.8% on MRCP Part 1 sample paper, which is 4.4% and 22.8% above the historical pass mark respectively. Both ChatGPT-3.5 and -4 performance was significantly above the historical pass mark for MRCP Part 1, indicating they would likely pass this examination. ChatGPT-3.5 failed eight out of nine postgraduate exams it performed with an average percentage of 5.0% below the pass mark. ChatGPT-4 passed nine out of eleven postgraduate exams it performed with an average percentage of 13.56% above the pass mark. ChatGPT-4 performance was significantly better than ChatGPT-3.5 in all examinations that both models were tested on.

CONCLUSION

ChatGPT-4 performed at above passing level for the majority of UK postgraduate medical examinations it was tested on. ChatGPT is prone to hallucinations, fabrications and reduced explanation accuracy which could limit its potential as a learning tool. The potential for these errors is an inherent part of LLMs and may always be a limitation for medical applications of ChatGPT.

摘要

目的

作为一个基于大型数据集训练的大型语言模型(LLM),ChatGPT 无需额外培训即可执行广泛的任务。我们通过系统地综述 ChatGPT 在英国研究生医学评估中的表现及其在皇家内科医师学会会员(MRCP)第 1 部分考试中的表现,评估了 ChatGPT 在英国研究生医学考试中的表现。

方法

检索了 Medline、Embase 和 Cochrane 数据库。系统综述中纳入了讨论 ChatGPT 在英国研究生医学考试中表现的文章。提取了考试表现的信息,包括百分比分数和通过/失败率。将 MRCP UK 第 1 部分的样题问题插入 ChatGPT-3.5 和 -4 中各四次,并根据提供的正确答案进行打分。

结果

最终有 12 项研究被纳入系统文献综述。ChatGPT-3.5 在 MRCP 第 1 部分的样卷中得分为 66.4%,ChatGPT-4 得分为 84.8%,分别比历史及格分数高 4.4%和 22.8%。ChatGPT-3.5 和 -4 的表现均明显高于 MRCP 第 1 部分的历史及格分数,表明它们很可能通过这次考试。ChatGPT-3.5 在其参加的九项研究生考试中失败了八项,平均低于及格分数 5.0%。ChatGPT-4 在其参加的十一项研究生考试中通过了九项,平均高于及格分数 13.56%。在两个模型都参加的所有考试中,ChatGPT-4 的表现均明显优于 ChatGPT-3.5。

结论

ChatGPT-4 在其参加的大多数英国研究生医学考试中表现均达到及格水平以上。ChatGPT 容易出现幻觉、编造和降低解释准确性,这可能限制了它作为学习工具的潜力。这些错误的可能性是大型语言模型的固有部分,并且可能一直是 ChatGPT 在医学应用中的一个限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b78d/11290618/185e2bb9d736/pone.0307372.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b78d/11290618/185e2bb9d736/pone.0307372.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b78d/11290618/185e2bb9d736/pone.0307372.g001.jpg

相似文献

1
Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments.评估 ChatGPT 在 MRCP 第 1 部分中的能力,并对其在研究生医学评估中的能力进行系统文献回顾。
PLoS One. 2024 Jul 31;19(7):e0307372. doi: 10.1371/journal.pone.0307372. eCollection 2024.
2
Can ChatGPT pass the MRCP (UK) written examinations? Analysis of performance and errors using a clinical decision-reasoning framework.ChatGPT 能否通过英国皇家内科医师学会会员资格考试(MRCP(UK))?使用临床决策推理框架分析表现和错误。
BMJ Open. 2024 Mar 15;14(3):e080558. doi: 10.1136/bmjopen-2023-080558.
3
PLAB and UK graduates' performance on MRCP(UK) and MRCGP examinations: data linkage study.PLAB 和英国毕业生在 MRCP(UK) 和 MRCGP 考试中的表现:数据链接研究。
BMJ. 2014 Apr 17;348:g2621. doi: 10.1136/bmj.g2621.
4
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
5
Could ChatGPT Pass the UK Radiology Fellowship Examinations?ChatGPT 能通过英国放射科医师研究员考试吗?
Acad Radiol. 2024 May;31(5):2178-2182. doi: 10.1016/j.acra.2023.11.026. Epub 2023 Dec 29.
6
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
7
Fitness to practise sanctions in UK doctors are predicted by poor performance at MRCGP and MRCP(UK) assessments: data linkage study.英国医生的行医能力制裁预测依据为 MRCGP 和 MRCP(UK)评估中的表现不佳:数据关联研究。
BMC Med. 2018 Dec 7;16(1):230. doi: 10.1186/s12916-018-1214-4.
8
A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology-Head and Neck Surgery Certification Examinations: Performance Study.一种评估 ChatGPT 在耳鼻喉头颈外科认证考试中表现的新评价模型:性能研究。
JMIR Med Educ. 2024 Jan 16;10:e49970. doi: 10.2196/49970.
9
Resitting a high-stakes postgraduate medical examination on multiple occasions: nonlinear multilevel modelling of performance in the MRCP(UK) examinations.多次重考高风险研究生医学考试:MRCP(UK)考试表现的非线性多层级建模。
BMC Med. 2012 Jun 14;10:60. doi: 10.1186/1741-7015-10-60.
10
Changes in standard of candidates taking the MRCP(UK) Part 1 examination, 1985 to 2002: analysis of marker questions.1985年至2002年参加英国皇家内科医师学会(MRCP)第一部分考试考生水平的变化:标记问题分析
BMC Med. 2005 Jul 18;3:13. doi: 10.1186/1741-7015-3-13.

引用本文的文献

1
Use and Evaluation of Generative Artificial Intelligence by Medical Students in Japan.日本医学生对生成式人工智能的使用与评估
JMA J. 2025 Jul 15;8(3):730-735. doi: 10.31662/jmaj.2024-0375. Epub 2025 Jul 2.
2
Bridging Gaps in Cancer Care: Utilizing Large Language Models for Accessible Dietary Recommendations.弥合癌症护理差距:利用大语言模型提供可获取的饮食建议。
Nutrients. 2025 Mar 28;17(7):1176. doi: 10.3390/nu17071176.
3
Higher education students' perceptions of ChatGPT: A global study of early reactions.高等教育学生对ChatGPT的认知:一项关于早期反应的全球研究。

本文引用的文献

1
ChatGPT's Response Consistency: A Study on Repeated Queries of Medical Examination Questions.ChatGPT的回答一致性:关于医学考试问题重复查询的研究
Eur J Investig Health Psychol Educ. 2024 Mar 8;14(3):657-668. doi: 10.3390/ejihpe14030043.
2
Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine.诊断推理提示揭示了医学中大型语言模型可解释性的潜力。
NPJ Digit Med. 2024 Jan 24;7(1):20. doi: 10.1038/s41746-024-01010-1.
3
Could ChatGPT Pass the UK Radiology Fellowship Examinations?ChatGPT 能通过英国放射科医师研究员考试吗?
PLoS One. 2025 Feb 5;20(2):e0315011. doi: 10.1371/journal.pone.0315011. eCollection 2025.
Acad Radiol. 2024 May;31(5):2178-2182. doi: 10.1016/j.acra.2023.11.026. Epub 2023 Dec 29.
4
Performance of Generative Pre-trained Transformer-4 (GPT-4) in Membership of the Royal College of General Practitioners (MRCGP)-style examination questions.生成式预训练变换器4(GPT-4)在皇家全科医师学院(MRCGP)风格考试问题中的表现。
Postgrad Med J. 2024 Mar 18;100(1182):274-275. doi: 10.1093/postmj/qgad128.
5
GPT-4 can pass the Korean National Licensing Examination for Korean Medicine Doctors.GPT-4能够通过韩国韩医医生国家执照考试。
PLOS Digit Health. 2023 Dec 15;2(12):e0000416. doi: 10.1371/journal.pdig.0000416. eCollection 2023 Dec.
6
ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model.ChatGPT 在伊朗医师执照考试中的应用:评估基于人工智能的模型的诊断准确性和决策能力。
BMJ Health Care Inform. 2023 Dec 11;30(1):e100815. doi: 10.1136/bmjhci-2023-100815.
7
Evaluation of Large language model performance on the Multi-Specialty Recruitment Assessment (MSRA) exam.大语言模型在多专科招聘评估(MSRA)考试中的表现评估。
Comput Biol Med. 2024 Jan;168:107794. doi: 10.1016/j.compbiomed.2023.107794. Epub 2023 Nov 30.
8
How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language.ChatGPT-4在非英语国家医学执照考试中的表现如何?中文语言环境下的一项评估。
PLOS Digit Health. 2023 Dec 1;2(12):e0000397. doi: 10.1371/journal.pdig.0000397. eCollection 2023 Dec.
9
Performance of large language models at the MRCS Part A: a tool for medical education?大型语言模型在MRCS A部分的表现:一种医学教育工具?
Ann R Coll Surg Engl. 2023 Dec 1. doi: 10.1308/rcsann.2023.0085.
10
Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study.ChatGPT、Bard、Claude 和 Bing 在秘鲁国家医师执照考试中的表现:一项横断面研究。
J Educ Eval Health Prof. 2023;20:30. doi: 10.3352/jeehp.2023.20.30. Epub 2023 Nov 20.