• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估人工智能在医学专科领域的局限性:ChatGPT在英国神经学专科证书考试中的表现。

Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination.

作者信息

Giannos Panagiotis

机构信息

Department of Life Sciences, Imperial College London, London, UK.

Society of Meta-Research and Biomedical Innovation, London, UK.

出版信息

BMJ Neurol Open. 2023 Jun 15;5(1):e000451. doi: 10.1136/bmjno-2023-000451. eCollection 2023.

DOI:10.1136/bmjno-2023-000451
PMID:37337531
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10277081/
Abstract

BACKGROUND

Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience.

METHODS

We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool-Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared.

RESULTS

ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics.

CONCLUSIONS

The advancements in ChatGPT-4's performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models' relevance and reliability in the rapidly evolving field of medicine.

摘要

背景

诸如ChatGPT之类的大型语言模型已展现出作为医学教育和实践创新工具的潜力,研究表明它们在普通医学考试和标准化入学测试中的表现达到或接近及格分数线。然而,尚无研究评估它们在英国医学教育背景下的表现,尤其是在专科层面,特别是在神经病学和神经科学领域。

方法

我们使用来自神经病学专科证书考试(SCE)网络题库的69道题目,评估了ChatGPT在神经病学和神经科学高级专科培训中的表现。数据集主要聚焦于神经病学(80%)。这些问题涵盖了症状和体征、诊断、解读和管理等子主题,有些问题涉及特定患者群体。对ChatGPT 3.5旧版、ChatGPT 3.5默认版和ChatGPT-4模型的表现进行了评估和比较。

结果

ChatGPT 3.5旧版和ChatGPT 3.5默认版的总体准确率分别为42%和57%,未达到2022年SCE神经病学考试58%的及格分数线。另一方面,ChatGPT-4的准确率最高,达到64%,超过了及格分数线,并且在各学科和子主题上均优于其前身。

结论

与前身相比,ChatGPT-4性能的提升证明了人工智能(AI)模型在专业医学教育和实践中的潜力。然而,我们的研究结果也凸显了AI开发者与医学专家持续开展开发与合作的必要性,以确保这些模型在快速发展的医学领域中的相关性和可靠性。

相似文献

1
Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination.评估人工智能在医学专科领域的局限性:ChatGPT在英国神经学专科证书考试中的表现。
BMJ Neurol Open. 2023 Jun 15;5(1):e000451. doi: 10.1136/bmjno-2023-000451. eCollection 2023.
2
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
3
How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language.ChatGPT-4在非英语国家医学执照考试中的表现如何?中文语言环境下的一项评估。
PLOS Digit Health. 2023 Dec 1;2(12):e0000397. doi: 10.1371/journal.pdig.0000397. eCollection 2023 Dec.
4
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
5
Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.使用高级 AI 学习和分析方法评估 ChatGPT-4 在家庭医学委员会考试中的表现:观察性研究。
JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.
6
Assessment of ChatGPT's performance on neurology written board examination questions.ChatGPT在神经病学笔试问题上的表现评估。
BMJ Neurol Open. 2023 Nov 2;5(2):e000530. doi: 10.1136/bmjno-2023-000530. eCollection 2023.
7
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。
Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.
8
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现:观察性研究。
JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.
9
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.ChatGPT 在中文体检、病历和教育方面的表现和探索:为医疗 AI 铺平道路。
Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4.
10
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.纯粹的智慧还是虚假的村庄?对 USMLE Step 3 题型的 ChatGPT 3.5 和 ChatGPT 4 的比较:定量分析。
JMIR Med Educ. 2024 Jan 5;10:e51148. doi: 10.2196/51148.

引用本文的文献

1
Evaluating the utility of ChatGPT in addressing conceptual and non-conceptual questions related to urodynamic quality control and trace analysis.评估ChatGPT在解决与尿动力学质量控制和痕量分析相关的概念性和非概念性问题方面的效用。
Sci Rep. 2025 Jun 19;15(1):20091. doi: 10.1038/s41598-025-01752-2.
2
Areas of research focus and trends in the research on the application of AIGC in healthcare.人工智能生成内容(AIGC)在医疗保健领域应用的研究重点领域和研究趋势。
J Health Popul Nutr. 2025 Jun 14;44(1):195. doi: 10.1186/s41043-025-00947-7.
3
Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.评估大型语言模型回答临床问题的可信度:横断面评估研究
JMIR Med Inform. 2025 May 16;13:e66917. doi: 10.2196/66917.
4
ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.医学教育中的ChatGPT及其他大语言模型——文献综述
Med Sci Educ. 2024 Nov 13;35(1):555-567. doi: 10.1007/s40670-024-02206-6. eCollection 2025 Feb.
5
Applications of Artificial Intelligence in Medical Education: A Systematic Review.人工智能在医学教育中的应用:一项系统综述。
Cureus. 2025 Mar 1;17(3):e79878. doi: 10.7759/cureus.79878. eCollection 2025 Mar.
6
ChatGPT4's diagnostic accuracy in inpatient neurology: A retrospective cohort study.ChatGPT4在住院神经内科的诊断准确性:一项回顾性队列研究。
Heliyon. 2024 Dec 9;10(24):e40964. doi: 10.1016/j.heliyon.2024.e40964. eCollection 2024 Dec 30.
7
An evaluation framework for clinical use of large language models in patient interaction tasks.用于患者互动任务中大型语言模型临床应用的评估框架。
Nat Med. 2025 Jan;31(1):77-86. doi: 10.1038/s41591-024-03328-5. Epub 2025 Jan 2.
8
Evaluating Large Language Models in extracting cognitive exam dates and scores.评估大语言模型在提取认知测试日期和分数方面的能力。
PLOS Digit Health. 2024 Dec 11;3(12):e0000685. doi: 10.1371/journal.pdig.0000685. eCollection 2024 Dec.
9
Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain.评估人工智能在专科医学领域的能力:西班牙神经病学专科考试中ChatGPT与神经科医生的比较分析
JMIR Med Educ. 2024 Nov 14;10:e56762. doi: 10.2196/56762.
10
Analyzing evaluation methods for large language models in the medical field: a scoping review.分析医学领域大语言模型的评价方法:范围综述。
BMC Med Inform Decis Mak. 2024 Nov 29;24(1):366. doi: 10.1186/s12911-024-02709-7.

本文引用的文献

1
Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations.ChatGPT在英国标准化入学考试中的表现:来自生物医学入学考试、大学数学入学测试、全国法律入学考试和思维技能评估考试的见解
JMIR Med Educ. 2023 Apr 26;9:e47737. doi: 10.2196/47737.
2
ChatGPT in Clinical Toxicology.临床毒理学中的ChatGPT
JMIR Med Educ. 2023 Mar 8;9:e46876. doi: 10.2196/46876.
3
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
4
AI bot ChatGPT writes smart essays - should professors worry?人工智能聊天机器人ChatGPT能写出很巧妙的文章——教授们应该担心吗?
Nature. 2022 Dec 9. doi: 10.1038/d41586-022-04397-7.
5
Could AI help you to write your next paper?人工智能能帮你撰写下一篇论文吗?
Nature. 2022 Nov;611(7934):192-193. doi: 10.1038/d41586-022-03479-w.
6
Natural language processing: state of the art, current trends and challenges.自然语言处理:技术现状、当前趋势与挑战。
Multimed Tools Appl. 2023;82(3):3713-3744. doi: 10.1007/s11042-022-13428-4. Epub 2022 Jul 14.