• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.

作者信息

Guerra Gage A, Hofmann Hayden, Sobhani Sina, Hofmann Grady, Gomez David, Soroudi Daniel, Hopkins Benjamin S, Dallas Jonathan, Pangal Dhiraj J, Cheok Stephanie, Nguyen Vincent N, Mack William J, Zada Gabriel

机构信息

Department of Neurosurgery, University of Southern California, Los Angeles, California, USA.

Department of Neurosurgery, University of Southern California, Los Angeles, California, USA.

出版信息

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

DOI:10.1016/j.wneu.2023.08.042
PMID:37597659
Abstract

BACKGROUND

Artificial intelligence (AI) and machine learning have transformed health care with applications in various specialized fields. Neurosurgery can benefit from artificial intelligence in surgical planning, predicting patient outcomes, and analyzing neuroimaging data. GPT-4, an updated language model with additional training parameters, has exhibited exceptional performance on standardized exams. This study examines GPT-4's competence on neurosurgical board-style questions, comparing its performance with medical students and residents, to explore its potential in medical education and clinical decision-making.

METHODS

GPT-4's performance was examined on 643 Congress of Neurological Surgeons Self-Assessment Neurosurgery Exam (SANS) board-style questions from various neurosurgery subspecialties. Of these, 477 were text-based and 166 contained images. GPT-4 refused to answer 52 questions that contained no text. The remaining 591 questions were inputted into GPT-4, and its performance was evaluated based on first-time responses. Raw scores were analyzed across subspecialties and question types, and then compared to previous findings on Chat Generative pre-trained transformer performance against SANS users, medical students, and neurosurgery residents.

RESULTS

GPT-4 attempted 91.9% of Congress of Neurological Surgeons SANS questions and achieved 76.6% accuracy. The model's accuracy increased to 79.0% for text-only questions. GPT-4 outperformed Chat Generative pre-trained transformer (P < 0.001) and scored highest in pain/peripheral nerve (84%) and lowest in spine (73%) categories. It exceeded the performance of medical students (26.3%), neurosurgery residents (61.5%), and the national average of SANS users (69.3%) across all categories.

CONCLUSIONS

GPT-4 significantly outperformed medical students, neurosurgery residents, and the national average of SANS users. The mode's accuracy suggests potential applications in educational settings and clinical decision-making, enhancing provider efficiency, and improving patient care.

摘要

背景

人工智能(AI)和机器学习已通过在各个专业领域的应用改变了医疗保健。神经外科可以在手术规划、预测患者预后以及分析神经影像数据方面从人工智能中受益。GPT-4是一种具有额外训练参数的更新语言模型,在标准化考试中表现出色。本研究考察GPT-4在神经外科委员会风格问题上的能力,将其表现与医学生和住院医师进行比较,以探索其在医学教育和临床决策中的潜力。

方法

在来自神经外科各个亚专业的643道神经外科医师协会自我评估神经外科考试(SANS)委员会风格问题上考察GPT-4的表现。其中,477道是基于文本的,166道包含图像。GPT-4拒绝回答52道不包含文本的问题。其余591道问题输入到GPT-4中,并根据首次回答评估其表现。对各亚专业和问题类型的原始分数进行分析,然后与之前关于Chat生成式预训练变换器针对SANS用户、医学生和神经外科住院医师的表现的研究结果进行比较。

结果

GPT-4尝试回答了91.9%的神经外科医师协会SANS问题,准确率达到76.6%。对于仅文本问题,该模型的准确率提高到79.0%。GPT-4的表现优于Chat生成式预训练变换器(P < 0.001),在疼痛/周围神经类别中得分最高(84%),在脊柱类别中得分最低(73%)。在所有类别中,它超过了医学生(26.3%)、神经外科住院医师(61.5%)以及SANS用户的全国平均水平(69.3%)。

结论

GPT-4的表现显著优于医学生、神经外科住院医师以及SANS用户的全国平均水平。该模型的准确率表明其在教育环境和临床决策中具有潜在应用,可提高医疗服务提供者的效率并改善患者护理。

相似文献

1
GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。
World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.
2
Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。
Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.
3
The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展:GPT-4 在骨科手术委员会问题上的表现。
Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.
4
Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.ChatGPT、GPT-4和谷歌巴德在神经外科口试准备题库上的表现。
Neurosurgery. 2023 Nov 1;93(5):1090-1098. doi: 10.1227/neu.0000000000002551. Epub 2023 Jun 12.
5
Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights.推进医学教育:生成式人工智能模型在耳鼻喉科委员会备考问题上的表现及图像分析见解
Cureus. 2024 Jul 9;16(7):e64204. doi: 10.7759/cureus.64204. eCollection 2024 Jul.
6
Large Language Model-Based Neurosurgical Evaluation Matrix: A Novel Scoring Criteria to Assess the Efficacy of ChatGPT as an Educational Tool for Neurosurgery Board Preparation.基于大语言模型的神经外科评估矩阵:一种评估 ChatGPT 作为神经外科委员会准备教育工具效果的新评分标准。
World Neurosurg. 2023 Dec;180:e765-e773. doi: 10.1016/j.wneu.2023.10.043. Epub 2023 Oct 14.
7
Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations.ChatGPT-3.5、ChatGPT-4 和骨科住院医师在骨科评估考试中的表现比较。
J Am Acad Orthop Surg. 2023 Dec 1;31(23):1173-1179. doi: 10.5435/JAAOS-D-23-00396. Epub 2023 Sep 4.
8
Letter to the Editor Regarding: "GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions".致编辑的信:关于“GPT-4人工智能模型在神经外科类笔试问题上的表现优于ChatGPT、医学生和神经外科住院医师”
World Neurosurg. 2024 Apr;184:351. doi: 10.1016/j.wneu.2024.01.155.
9
Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.GPT-3.5 和 GPT-4 与医学生在书面德语文凭考试中的表现比较:观察性研究。
JMIR Med Educ. 2024 Feb 8;10:e50965. doi: 10.2196/50965.
10
Educational Limitations of ChatGPT in Neurosurgery Board Preparation.ChatGPT在神经外科专科医师考试备考中的教育局限性
Cureus. 2024 Apr 20;16(4):e58639. doi: 10.7759/cureus.58639. eCollection 2024 Apr.

引用本文的文献

1
Exploring perspectives and boundaries in neurosurgical career pathways for generation Z in German-speaking countries.探索德语国家Z世代神经外科职业道路中的观点与界限。
Brain Spine. 2025 Aug 6;5:104382. doi: 10.1016/j.bas.2025.104382. eCollection 2025.
2
Postoperative complication management: How do large language models measure up to human expertise?术后并发症管理:大语言模型与人类专业知识相比如何?
PLOS Digit Health. 2025 Aug 1;4(8):e0000933. doi: 10.1371/journal.pdig.0000933. eCollection 2025 Aug.
3
Evaluating multiple large language models on orbital diseases.
评估多种大语言模型在眼眶疾病方面的表现。
Front Cell Dev Biol. 2025 Jul 7;13:1574378. doi: 10.3389/fcell.2025.1574378. eCollection 2025.
4
Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations.评估大语言模型在中医诊断和治疗建议中的作用。
NPJ Digit Med. 2025 Jul 21;8(1):466. doi: 10.1038/s41746-025-01845-2.
5
OpenAI o1 Large Language Model Outperforms GPT-4o, Gemini 1.5 Flash, and Human Test Takers on Ophthalmology Board-Style Questions.OpenAI的o1大语言模型在眼科委员会风格的问题上表现优于GPT-4o、Gemini 1.5 Flash和人类考生。
Ophthalmol Sci. 2025 Jun 6;5(6):100844. doi: 10.1016/j.xops.2025.100844. eCollection 2025 Nov-Dec.
6
ChatGPT performance in answering medical residency questions in nephrology: a pilot study in Brazil.ChatGPT在回答巴西肾脏科住院医师问题方面的表现:一项试点研究
J Bras Nefrol. 2025 Oct-Dec;47(4):e20240254. doi: 10.1590/2175-8239-JBN-2024-0254en.
7
A comparative analysis of DeepSeek R1, DeepSeek-R1-Lite, OpenAi o1 Pro, and Grok 3 performance on ophthalmology board-style questions.DeepSeek R1、DeepSeek-R1-Lite、OpenAi o1 Pro和Grok 3在眼科委员会式问题上的性能比较分析。
Sci Rep. 2025 Jul 2;15(1):23101. doi: 10.1038/s41598-025-08601-2.
8
Evaluating the accuracy of advanced language learning models in ophthalmology: A comparative study of ChatGPT-4o and Meta AI's Llama 3.1.评估先进语言学习模型在眼科领域的准确性:ChatGPT-4o与Meta AI的Llama 3.1的比较研究
Adv Ophthalmol Pract Res. 2025 Jan 6;5(2):95-99. doi: 10.1016/j.aopr.2025.01.002. eCollection 2025 May-Jun.
9
Harnessing GPT-4 for automated error detection in pathology reports: Implications for oncology diagnostics.利用GPT-4进行病理报告中的自动错误检测:对肿瘤诊断的影响。
Digit Health. 2025 May 29;11:20552076251346703. doi: 10.1177/20552076251346703. eCollection 2025 Jan-Dec.
10
Artificial intelligence in clinical practice: a cross-sectional survey of paediatric surgery residents' perspectives.临床实践中的人工智能:一项关于小儿外科住院医师观点的横断面调查。
BMJ Health Care Inform. 2025 May 21;32(1):e101456. doi: 10.1136/bmjhci-2025-101456.