• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估人工智能在专科医学领域的能力:西班牙神经病学专科考试中ChatGPT与神经科医生的比较分析

Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain.

作者信息

Ros-Arlanzón Pablo, Perez-Sempere Angel

机构信息

Department of Neurology, Dr. Balmis General University Hospital, C/ Pintor Baeza, Nº 11, Alicante, 03010, Spain, 34 965933000.

Department of Neuroscience, Instituto de Investigación Sanitaria y Biomédica de Alicante, Alicante, Spain.

出版信息

JMIR Med Educ. 2024 Nov 14;10:e56762. doi: 10.2196/56762.

DOI:10.2196/56762
PMID:39622707
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11611784/
Abstract

BACKGROUND

With the rapid advancement of artificial intelligence (AI) in various fields, evaluating its application in specialized medical contexts becomes crucial. ChatGPT, a large language model developed by OpenAI, has shown potential in diverse applications, including medicine.

OBJECTIVE

This study aims to compare the performance of ChatGPT with that of attending neurologists in a real neurology specialist examination conducted in the Valencian Community, Spain, assessing the AI's capabilities and limitations in medical knowledge.

METHODS

We conducted a comparative analysis using the 2022 neurology specialist examination results from 120 neurologists and responses generated by ChatGPT versions 3.5 and 4. The examination consisted of 80 multiple-choice questions, with a focus on clinical neurology and health legislation. Questions were classified according to Bloom's Taxonomy. Statistical analysis of performance, including the κ coefficient for response consistency, was performed.

RESULTS

Human participants exhibited a median score of 5.91 (IQR: 4.93-6.76), with 32 neurologists failing to pass. ChatGPT-3.5 ranked 116th out of 122, answering 54.5% of questions correctly (score 3.94). ChatGPT-4 showed marked improvement, ranking 17th with 81.8% of correct answers (score 7.57), surpassing several human specialists. No significant variations were observed in the performance on lower-order questions versus higher-order questions. Additionally, ChatGPT-4 demonstrated increased interrater reliability, as reflected by a higher κ coefficient of 0.73, compared to ChatGPT-3.5's coefficient of 0.69.

CONCLUSIONS

This study underscores the evolving capabilities of AI in medical knowledge assessment, particularly in specialized fields. ChatGPT-4's performance, outperforming the median score of human participants in a rigorous neurology examination, represents a significant milestone in AI development, suggesting its potential as an effective tool in specialized medical education and assessment.

摘要

背景

随着人工智能(AI)在各个领域的迅速发展,评估其在专业医学背景下的应用变得至关重要。ChatGPT是OpenAI开发的一种大型语言模型,已在包括医学在内的各种应用中显示出潜力。

目的

本研究旨在比较ChatGPT与西班牙巴伦西亚自治区进行的实际神经科专科考试中神经科主治医生的表现,评估人工智能在医学知识方面的能力和局限性。

方法

我们使用了120名神经科医生的2022年神经科专科考试结果以及ChatGPT 3.5和4版本生成的回答进行了比较分析。该考试由80道多项选择题组成,重点是临床神经学和健康立法。问题根据布鲁姆分类法进行分类。对表现进行了统计分析,包括回答一致性的κ系数。

结果

人类参与者的中位数分数为5.91(四分位距:4.93 - 6.76),有32名神经科医生未通过。ChatGPT - 3.5在122个中排名第116,正确回答了54.5%的问题(分数3.94)。ChatGPT - 4表现出显著进步,以81.8%的正确答案排名第17(分数7.57),超过了几位人类专家。在低阶问题和高阶问题的表现上未观察到显著差异。此外,ChatGPT - 4显示出更高的评分者间信度,其κ系数为0.73,高于ChatGPT - 3.5的0.69系数。

结论

本研究强调了人工智能在医学知识评估方面不断发展的能力,特别是在专业领域。ChatGPT - 4在严格的神经科考试中的表现超过了人类参与者的中位数分数,这代表了人工智能发展中的一个重要里程碑,表明其作为专业医学教育和评估的有效工具的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47b2/11611784/2d0794784226/mededu-v10-e56762-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47b2/11611784/2d0794784226/mededu-v10-e56762-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47b2/11611784/2d0794784226/mededu-v10-e56762-g001.jpg

相似文献

1
Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain.评估人工智能在专科医学领域的能力:西班牙神经病学专科考试中ChatGPT与神经科医生的比较分析
JMIR Med Educ. 2024 Nov 14;10:e56762. doi: 10.2196/56762.
2
Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.使用高级 AI 学习和分析方法评估 ChatGPT-4 在家庭医学委员会考试中的表现:观察性研究。
JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.
3
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比:与眼科住院医师一起对医学知识进行的全面考察
Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.
4
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
5
Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy.比较ChatGPT-4与医学生在布鲁姆教育目标分类法不同层次多项选择题上的表现。
Adv Med Educ Pract. 2024 May 10;15:393-400. doi: 10.2147/AMEP.S457408. eCollection 2024.
6
ChatGPT Earns American Board Certification in Hand Surgery.ChatGPT 获得美国手部外科委员会认证。
Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.
7
Exploring the Performance of ChatGPT-4 in the Taiwan Audiologist Qualification Examination: Preliminary Observational Study Highlighting the Potential of AI Chatbots in Hearing Care.探索 ChatGPT-4 在台湾听力学家资格考试中的表现:初步观察性研究强调 AI 聊天机器人在听力保健中的潜力。
JMIR Med Educ. 2024 Apr 26;10:e55595. doi: 10.2196/55595.
8
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
9
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗?骨科住院医师与ChatGPT的对比。
Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.
10
Is artificial intelligence successful in the Turkish neurology board exam?人工智能在土耳其神经学委员会考试中取得成功了吗?
Neurol Res. 2025 May;47(5):402-405. doi: 10.1080/01616412.2025.2481444. Epub 2025 Mar 20.

本文引用的文献

1
Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study.ChatGPT生成的关于阻断性正畸信息的准确性和完整性:一项多中心合作研究
J Clin Med. 2024 Jan 27;13(3):735. doi: 10.3390/jcm13030735.
2
Performance of ChatGPT incorporated chain-of-thought method in bilingual nuclear medicine physician board examinations.结合思维链方法的ChatGPT在双语核医学医师资格考试中的表现
Digit Health. 2024 Jan 5;10:20552076231224074. doi: 10.1177/20552076231224074. eCollection 2024 Jan-Dec.
3
Performance of ChatGPT in Board Examinations for Specialists in the Japanese Ophthalmology Society.
ChatGPT在日本眼科学会专科医生资格考试中的表现。
Cureus. 2023 Dec 4;15(12):e49903. doi: 10.7759/cureus.49903. eCollection 2023 Dec.
4
Could ChatGPT Pass the UK Radiology Fellowship Examinations?ChatGPT 能通过英国放射科医师研究员考试吗?
Acad Radiol. 2024 May;31(5):2178-2182. doi: 10.1016/j.acra.2023.11.026. Epub 2023 Dec 29.
5
Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training.利用 ChatGPT 和 GPT-4 评估西班牙专科医学培训准入考试的风湿病学问题。
Sci Rep. 2023 Dec 13;13(1):22129. doi: 10.1038/s41598-023-49483-6.
6
Performance of Large Language Models on a Neurology Board-Style Examination.大语言模型在神经科 board-style 考试中的表现。
JAMA Netw Open. 2023 Dec 1;6(12):e2346721. doi: 10.1001/jamanetworkopen.2023.46721.
7
How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language.ChatGPT-4在非英语国家医学执照考试中的表现如何?中文语言环境下的一项评估。
PLOS Digit Health. 2023 Dec 1;2(12):e0000397. doi: 10.1371/journal.pdig.0000397. eCollection 2023 Dec.
8
Could ChatGPT-4 pass an anaesthesiology board examination? Follow-up assessment of a comprehensive set of board examination practice questions.ChatGPT-4能通过麻醉学委员会考试吗?对一套全面的委员会考试练习题的后续评估。
Br J Anaesth. 2024 Jan;132(1):172-174. doi: 10.1016/j.bja.2023.10.025. Epub 2023 Nov 22.
9
Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination.评估 GPT-3.5 和 GPT-4 在波兰医学期末考试中的表现。
Sci Rep. 2023 Nov 22;13(1):20512. doi: 10.1038/s41598-023-46995-z.
10
Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine.评估ChatGPT在应对西班牙医学住院医师入学考试(MIR)中的效果:人工智能在临床医学中的广阔前景。
Clin Pract. 2023 Nov 20;13(6):1460-1487. doi: 10.3390/clinpract13060130.