文献检索，用中文搜 PubMed

BACKGROUND

Artificial intelligence (AI) and machine learning have transformed health care with applications in various specialized fields. Neurosurgery can benefit from artificial intelligence in surgical planning, predicting patient outcomes, and analyzing neuroimaging data. GPT-4, an updated language model with additional training parameters, has exhibited exceptional performance on standardized exams. This study examines GPT-4's competence on neurosurgical board-style questions, comparing its performance with medical students and residents, to explore its potential in medical education and clinical decision-making.

METHODS

GPT-4's performance was examined on 643 Congress of Neurological Surgeons Self-Assessment Neurosurgery Exam (SANS) board-style questions from various neurosurgery subspecialties. Of these, 477 were text-based and 166 contained images. GPT-4 refused to answer 52 questions that contained no text. The remaining 591 questions were inputted into GPT-4, and its performance was evaluated based on first-time responses. Raw scores were analyzed across subspecialties and question types, and then compared to previous findings on Chat Generative pre-trained transformer performance against SANS users, medical students, and neurosurgery residents.

RESULTS

GPT-4 attempted 91.9% of Congress of Neurological Surgeons SANS questions and achieved 76.6% accuracy. The model's accuracy increased to 79.0% for text-only questions. GPT-4 outperformed Chat Generative pre-trained transformer (P < 0.001) and scored highest in pain/peripheral nerve (84%) and lowest in spine (73%) categories. It exceeded the performance of medical students (26.3%), neurosurgery residents (61.5%), and the national average of SANS users (69.3%) across all categories.

CONCLUSIONS

GPT-4 significantly outperformed medical students, neurosurgery residents, and the national average of SANS users. The mode's accuracy suggests potential applications in educational settings and clinical decision-making, enhancing provider efficiency, and improving patient care.

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

人工智能（AI）和机器学习已通过在各个专业领域的应用改变了医疗保健。神经外科可以在手术规划、预测患者预后以及分析神经影像数据方面从人工智能中受益。GPT-4是一种具有额外训练参数的更新语言模型，在标准化考试中表现出色。本研究考察GPT-4在神经外科委员会风格问题上的能力，将其表现与医学生和住院医师进行比较，以探索其在医学教育和临床决策中的潜力。

方法

在来自神经外科各个亚专业的643道神经外科医师协会自我评估神经外科考试（SANS）委员会风格问题上考察GPT-4的表现。其中，477道是基于文本的，166道包含图像。GPT-4拒绝回答52道不包含文本的问题。其余591道问题输入到GPT-4中，并根据首次回答评估其表现。对各亚专业和问题类型的原始分数进行分析，然后与之前关于Chat生成式预训练变换器针对SANS用户、医学生和神经外科住院医师的表现的研究结果进行比较。

结果

GPT-4尝试回答了91.9%的神经外科医师协会SANS问题，准确率达到76.6%。对于仅文本问题，该模型的准确率提高到79.0%。GPT-4的表现优于Chat生成式预训练变换器（P < 0.001），在疼痛/周围神经类别中得分最高（84%），在脊柱类别中得分最低（73%）。在所有类别中，它超过了医学生（26.3%）、神经外科住院医师（61.5%）以及SANS用户的全国平均水平（69.3%）。

结论

GPT-4的表现显著优于医学生、神经外科住院医师以及SANS用户的全国平均水平。该模型的准确率表明其在教育环境和临床决策中具有潜在应用，可提高医疗服务提供者的效率并改善患者护理。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

相似文献

引用本文的文献

GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献