文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination.

作者信息

Giannos Panagiotis

机构信息

Department of Life Sciences, Imperial College London, London, UK.

Society of Meta-Research and Biomedical Innovation, London, UK.

出版信息

BMJ Neurol Open. 2023 Jun 15;5(1):e000451. doi: 10.1136/bmjno-2023-000451. eCollection 2023.


DOI:10.1136/bmjno-2023-000451
PMID:37337531
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10277081/
Abstract

BACKGROUND: Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience. METHODS: We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool-Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared. RESULTS: ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics. CONCLUSIONS: The advancements in ChatGPT-4's performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models' relevance and reliability in the rapidly evolving field of medicine.

摘要

相似文献

[1]
Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination.

BMJ Neurol Open. 2023-6-15

[2]
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024-2-9

[3]
How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language.

PLOS Digit Health. 2023-12-1

[4]
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024-7-25

[5]
Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.

JMIR Med Educ. 2024-10-8

[6]
Assessment of ChatGPT's performance on neurology written board examination questions.

BMJ Neurol Open. 2023-11-2

[7]
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.

Front Med (Lausanne). 2023-12-13

[8]
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.

JMIR Med Educ. 2024-4-29

[9]
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.

Int J Med Inform. 2023-9

[10]
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.

JMIR Med Educ. 2024-1-5

引用本文的文献

[1]
Evaluating the utility of ChatGPT in addressing conceptual and non-conceptual questions related to urodynamic quality control and trace analysis.

Sci Rep. 2025-6-19

[2]
Areas of research focus and trends in the research on the application of AIGC in healthcare.

J Health Popul Nutr. 2025-6-14

[3]
Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

JMIR Med Inform. 2025-5-16

[4]
ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.

Med Sci Educ. 2024-11-13

[5]
Applications of Artificial Intelligence in Medical Education: A Systematic Review.

Cureus. 2025-3-1

[6]
ChatGPT4's diagnostic accuracy in inpatient neurology: A retrospective cohort study.

Heliyon. 2024-12-9

[7]
An evaluation framework for clinical use of large language models in patient interaction tasks.

Nat Med. 2025-1

[8]
Evaluating Large Language Models in extracting cognitive exam dates and scores.

PLOS Digit Health. 2024-12-11

[9]
Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain.

JMIR Med Educ. 2024-11-14

[10]
Analyzing evaluation methods for large language models in the medical field: a scoping review.

BMC Med Inform Decis Mak. 2024-11-29

本文引用的文献

[1]
Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations.

JMIR Med Educ. 2023-4-26

[2]
ChatGPT in Clinical Toxicology.

JMIR Med Educ. 2023-3-8

[3]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

[4]
AI bot ChatGPT writes smart essays - should professors worry?

Nature. 2022-12-9

[5]
Could AI help you to write your next paper?

Nature. 2022-11

[6]
Natural language processing: state of the art, current trends and challenges.

Multimed Tools Appl. 2023

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索