ChatGPT与神经外科住院医师在类似神经外科委员会考试问题上的表现：一项系统评价和荟萃分析。

The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.

作者信息

Bongco Edgar Dominic A, Cua Sean Kendrich N, Hernandez Mary Angeline Luz U, Pascual Juan Silvestre G, Khu Kathleen Joy O

机构信息

Division of Neurosurgery, Department of Neurosciences, College of Medicine and Philippine General Hospital, University of the Philippines Manila, Manila, Philippines.

出版信息

Neurosurg Rev. 2024 Dec 7;47(1):892. doi: 10.1007/s10143-024-03144-y.

DOI:10.1007/s10143-024-03144-y

PMID:39643792

Abstract

OBJECTIVE

Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.

METHODS

A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05.

RESULTS

After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT.

CONCLUSION

Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.

摘要

目的

大语言模型和ChatGPT已应用于医学教育的不同领域。本研究旨在回顾关于ChatGPT在类似神经外科委员会考试问题中的表现与神经外科住院医师相比的文献。

方法

按照PRISMA指南进行文献检索，涵盖ChatGPT创立时间（2022年11月）至2024年10月25日。两名评审员筛选符合条件的研究，选择那些使用ChatGPT回答类似神经外科委员会考试问题并将结果与神经外科住院医师分数进行比较的研究。使用JBI批判性评价工具评估偏倚风险。使用固定效应模型确定总体效应大小和95%置信区间，α设定为0.05。

结果

筛选后，选择了六项研究进行定性和定量分析。ChatGPT的准确率在50.4%至78.8%之间，而住院医师的准确率为58.3%至73.7%。在审查的六项研究中，四项研究的偏倚风险较低；其余研究有中度风险。总体趋势是神经外科住院医师优于ChatGPT（p < 0.00001），异质性较高（I = 96）。在对使用神经外科自我评估（SANS）考试问题的研究进行亚组分析时，这些结果相似。然而，在敏感性分析中，去除权重最高的研究使结果偏向ChatGPT表现更好。

结论

我们的荟萃分析表明，在回答类似神经外科委员会考试问题方面，神经外科住院医师的表现优于ChatGPT，尽管所审查的研究具有高度异质性。在它能够成为神经外科教育中有用且可靠的辅助工具之前，还需要进一步改进。

相似文献

The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.ChatGPT与神经外科住院医师在类似神经外科委员会考试问题上的表现：一项系统评价和荟萃分析。

Neurosurg Rev. 2024 Dec 7;47(1):892. doi: 10.1007/s10143-024-03144-y.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验：对定性文献的系统综述

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状Meta分析。

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.成人全身麻醉后预防术后恶心呕吐的药物：网状Meta分析

Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.

Impact of residual disease as a prognostic factor for survival in women with advanced epithelial ovarian cancer after primary surgery.原发性手术后晚期上皮性卵巢癌患者残留病灶对生存预后的影响。

Cochrane Database Syst Rev. 2022 Sep 26;9(9):CD015048. doi: 10.1002/14651858.CD015048.pub2.

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用：ChatGPT 在 AAOS 骨科住院医师培训考试（OITE）全题文本和图像问题上的表现。

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

The educational effects of portfolios on undergraduate student learning: a Best Evidence Medical Education (BEME) systematic review. BEME Guide No. 11.档案袋对本科学生学习的教育效果：最佳证据医学教育（BEME）系统评价。BEME指南第11号。

Med Teach. 2009 Apr;31(4):282-98. doi: 10.1080/01421590902889897.

引用本文的文献

Comparative performance of neurosurgery-specific, peer-reviewed versus general AI chatbots in bilingual board examinations: evaluating accuracy, consistency, and error minimization strategies.神经外科特定的、经过同行评审的人工智能聊天机器人与通用人工智能聊天机器人在双语资格考试中的比较表现：评估准确性、一致性和错误最小化策略。

Acta Neurochir (Wien). 2025 Sep 9;167(1):241. doi: 10.1007/s00701-025-06628-y.

本文引用的文献

ChatGPT and neurosurgical education: A crossroads of innovation and opportunity.ChatGPT 与神经外科学教育：创新与机遇的交汇点。

J Clin Neurosci. 2024 Nov;129:110815. doi: 10.1016/j.jocn.2024.110815. Epub 2024 Sep 4.

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

A systematic review of large language models and their implications in medical education.大型语言模型及其在医学教育中的应用的系统评价。

Med Educ. 2024 Nov;58(11):1276-1285. doi: 10.1111/medu.15402. Epub 2024 Apr 19.

Evaluation of the safety, accuracy, and helpfulness of the GPT-4.0 Large Language Model in neurosurgery.评估 GPT-4.0 大语言模型在神经外科中的安全性、准确性和有用性。

J Clin Neurosci. 2024 May;123:151-156. doi: 10.1016/j.jocn.2024.03.021. Epub 2024 Apr 4.

neuroGPT-X: toward a clinic-ready large language model.神经 GPT-X：迈向临床就绪的大型语言模型。

J Neurosurg. 2023 Oct 6;140(4):1041-1053. doi: 10.3171/2023.7.JNS23573. Print 2024 Apr 1.

Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.探索人工智能在神经外科培训中的应用：ChatGPT参加神经外科住院医师笔试。

Brain Spine. 2023 Nov 29;4:102715. doi: 10.1016/j.bas.2023.102715. eCollection 2024.

Beyond human in neurosurgical exams: ChatGPT's success in the Turkish neurosurgical society proficiency board exams.神经外科考试中的超越人类：ChatGPT 在土耳其神经外科学会专业能力考试中的成功。

Comput Biol Med. 2024 Feb;169:107807. doi: 10.1016/j.compbiomed.2023.107807. Epub 2023 Dec 10.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。

Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.

Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.ChatGPT、GPT-4和谷歌巴德在神经外科口试准备题库上的表现。

Neurosurgery. 2023 Nov 1;93(5):1090-1098. doi: 10.1227/neu.0000000000002551. Epub 2023 Jun 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ChatGPT与神经外科住院医师在类似神经外科委员会考试问题上的表现：一项系统评价和荟萃分析。

The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献