Guerra Gage A, Hofmann Hayden, Sobhani Sina, Hofmann Grady, Gomez David, Soroudi Daniel, Hopkins Benjamin S, Dallas Jonathan, Pangal Dhiraj J, Cheok Stephanie, Nguyen Vincent N, Mack William J, Zada Gabriel
Department of Neurosurgery, University of Southern California, Los Angeles, California, USA.
Department of Neurosurgery, University of Southern California, Los Angeles, California, USA.
World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.
Artificial intelligence (AI) and machine learning have transformed health care with applications in various specialized fields. Neurosurgery can benefit from artificial intelligence in surgical planning, predicting patient outcomes, and analyzing neuroimaging data. GPT-4, an updated language model with additional training parameters, has exhibited exceptional performance on standardized exams. This study examines GPT-4's competence on neurosurgical board-style questions, comparing its performance with medical students and residents, to explore its potential in medical education and clinical decision-making.
GPT-4's performance was examined on 643 Congress of Neurological Surgeons Self-Assessment Neurosurgery Exam (SANS) board-style questions from various neurosurgery subspecialties. Of these, 477 were text-based and 166 contained images. GPT-4 refused to answer 52 questions that contained no text. The remaining 591 questions were inputted into GPT-4, and its performance was evaluated based on first-time responses. Raw scores were analyzed across subspecialties and question types, and then compared to previous findings on Chat Generative pre-trained transformer performance against SANS users, medical students, and neurosurgery residents.
GPT-4 attempted 91.9% of Congress of Neurological Surgeons SANS questions and achieved 76.6% accuracy. The model's accuracy increased to 79.0% for text-only questions. GPT-4 outperformed Chat Generative pre-trained transformer (P < 0.001) and scored highest in pain/peripheral nerve (84%) and lowest in spine (73%) categories. It exceeded the performance of medical students (26.3%), neurosurgery residents (61.5%), and the national average of SANS users (69.3%) across all categories.
GPT-4 significantly outperformed medical students, neurosurgery residents, and the national average of SANS users. The mode's accuracy suggests potential applications in educational settings and clinical decision-making, enhancing provider efficiency, and improving patient care.
人工智能(AI)和机器学习已通过在各个专业领域的应用改变了医疗保健。神经外科可以在手术规划、预测患者预后以及分析神经影像数据方面从人工智能中受益。GPT-4是一种具有额外训练参数的更新语言模型,在标准化考试中表现出色。本研究考察GPT-4在神经外科委员会风格问题上的能力,将其表现与医学生和住院医师进行比较,以探索其在医学教育和临床决策中的潜力。
在来自神经外科各个亚专业的643道神经外科医师协会自我评估神经外科考试(SANS)委员会风格问题上考察GPT-4的表现。其中,477道是基于文本的,166道包含图像。GPT-4拒绝回答52道不包含文本的问题。其余591道问题输入到GPT-4中,并根据首次回答评估其表现。对各亚专业和问题类型的原始分数进行分析,然后与之前关于Chat生成式预训练变换器针对SANS用户、医学生和神经外科住院医师的表现的研究结果进行比较。
GPT-4尝试回答了91.9%的神经外科医师协会SANS问题,准确率达到76.6%。对于仅文本问题,该模型的准确率提高到79.0%。GPT-4的表现优于Chat生成式预训练变换器(P < 0.001),在疼痛/周围神经类别中得分最高(84%),在脊柱类别中得分最低(73%)。在所有类别中,它超过了医学生(26.3%)、神经外科住院医师(61.5%)以及SANS用户的全国平均水平(69.3%)。
GPT-4的表现显著优于医学生、神经外科住院医师以及SANS用户的全国平均水平。该模型的准确率表明其在教育环境和临床决策中具有潜在应用,可提高医疗服务提供者的效率并改善患者护理。