Suppr超能文献

ChatGPT与神经外科住院医师在类似神经外科委员会考试问题上的表现:一项系统评价和荟萃分析。

The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis.

作者信息

Bongco Edgar Dominic A, Cua Sean Kendrich N, Hernandez Mary Angeline Luz U, Pascual Juan Silvestre G, Khu Kathleen Joy O

机构信息

Division of Neurosurgery, Department of Neurosciences, College of Medicine and Philippine General Hospital, University of the Philippines Manila, Manila, Philippines.

出版信息

Neurosurg Rev. 2024 Dec 7;47(1):892. doi: 10.1007/s10143-024-03144-y.

Abstract

OBJECTIVE

Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents.

METHODS

A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05.

RESULTS

After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT.

CONCLUSION

Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.

摘要

目的

大语言模型和ChatGPT已应用于医学教育的不同领域。本研究旨在回顾关于ChatGPT在类似神经外科委员会考试问题中的表现与神经外科住院医师相比的文献。

方法

按照PRISMA指南进行文献检索,涵盖ChatGPT创立时间(2022年11月)至2024年10月25日。两名评审员筛选符合条件的研究,选择那些使用ChatGPT回答类似神经外科委员会考试问题并将结果与神经外科住院医师分数进行比较的研究。使用JBI批判性评价工具评估偏倚风险。使用固定效应模型确定总体效应大小和95%置信区间,α设定为0.05。

结果

筛选后,选择了六项研究进行定性和定量分析。ChatGPT的准确率在50.4%至78.8%之间,而住院医师的准确率为58.3%至73.7%。在审查的六项研究中,四项研究的偏倚风险较低;其余研究有中度风险。总体趋势是神经外科住院医师优于ChatGPT(p < 0.00001),异质性较高(I = 96)。在对使用神经外科自我评估(SANS)考试问题的研究进行亚组分析时,这些结果相似。然而,在敏感性分析中,去除权重最高的研究使结果偏向ChatGPT表现更好。

结论

我们的荟萃分析表明,在回答类似神经外科委员会考试问题方面,神经外科住院医师的表现优于ChatGPT,尽管所审查的研究具有高度异质性。在它能够成为神经外科教育中有用且可靠的辅助工具之前,还需要进一步改进。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验