人工智能能否通过欧洲神经外科书面考试？——伦理与实际问题。

Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.

作者信息

Stengel Felix C, Stienen Martin N, Ivanov Marcel, Gandía-González María L, Raffa Giovanni, Ganau Mario, Whitfield Peter, Motov Stefan

机构信息

Department of Neurosurgery & Spine Center of Eastern Switzerland, Kantonsspital St. Gallen & Medical School of St.Gallen, St. Gallen, Switzerland.

Royal Hallamshire Hospital, Sheffield, United Kingdom.

出版信息

Brain Spine. 2024 Feb 13;4:102765. doi: 10.1016/j.bas.2024.102765. eCollection 2024.

DOI:10.1016/j.bas.2024.102765

PMID:38510593

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10951784/

Abstract

INTRODUCTION

Artificial intelligence (AI) based large language models (LLM) contain enormous potential in education and training. Recent publications demonstrated that they are able to outperform participants in written medical exams.

RESEARCH QUESTION

We aimed to explore the accuracy of AI in the written part of the EANS board exam.

MATERIAL AND METHODS

Eighty-six representative single best answer (SBA) questions, included at least ten times in prior EANS board exams, were selected by the current EANS board exam committee. The questions' content was classified as 75 text-based (TB) and 11 image-based (IB) and their structure as 50 interpretation-weighted, 30 theory-based and 6 true-or-false. Questions were tested with Chat GPT 3.5, Bing and Bard. The AI and participant results were statistically analyzed through ANOVA tests with Stata SE 15 (StataCorp, College Station, TX). P-values of <0.05 were considered as statistically significant.

RESULTS

The Bard LLM achieved the highest accuracy with 62% correct questions overall and 69% excluding IB, outperforming human exam participants 59% (p = 0.67) and 59% (p = 0.42), respectively. All LLMs scored highest in theory-based questions, excluding IB questions (Chat-GPT: 79%; Bing: 83%; Bard: 86%) and significantly better than the human exam participants (60%; p = 0.03). AI could not answer any IB question correctly.

DISCUSSION AND CONCLUSION

AI passed the written EANS board exam based on representative SBA questions and achieved results close to or even better than the human exam participants. Our results raise several ethical and practical implications, which may impact the current concept for the written EANS board exam.

摘要

引言

基于人工智能（AI）的大语言模型（LLM）在教育和培训方面具有巨大潜力。最近的出版物表明，它们在书面医学考试中表现优于考生。

研究问题

我们旨在探讨人工智能在欧洲神经外科医师协会（EANS）委员会书面考试中的准确性。

材料与方法

当前的EANS委员会考试委员会选择了86道具有代表性的单项最佳答案（SBA）问题，这些问题在之前的EANS委员会考试中至少出现过十次。问题内容分为75道基于文本（TB）的问题和11道基于图像（IB）的问题，其结构分为50道解释加权型、30道理论型和6道是非型。使用Chat GPT 3.5、必应（Bing）和巴德（Bard）对这些问题进行测试。通过使用Stata SE 15（StataCorp公司，得克萨斯州大学站）进行方差分析测试，对人工智能和考生的结果进行统计分析。P值<0.05被认为具有统计学意义。

结果

巴德大语言模型的准确率最高，总体正确问题率为62%，不包括基于图像的问题时为69%，分别优于人类考生的59%（p = 0.67）和59%（p = 0.42）。所有大语言模型在不包括基于图像问题的理论型问题上得分最高（Chat-GPT：79%；必应：83%；巴德：86%），且显著优于人类考生（60%；p = 0.03）。人工智能无法正确回答任何基于图像的问题。

讨论与结论

基于具有代表性的单项最佳答案问题，人工智能通过了EANS委员会书面考试，取得了与人类考生相近甚至更好的成绩。我们的结果引发了一些伦理和实际问题，可能会影响当前EANS委员会书面考试的理念。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

人工智能能否通过欧洲神经外科书面考试？——伦理与实际问题。

Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.

作者信息

机构信息

出版信息

INTRODUCTION

RESEARCH QUESTION

MATERIAL AND METHODS

RESULTS

DISCUSSION AND CONCLUSION

引言

研究问题

材料与方法

结果

讨论与结论

相似文献

引用本文的文献

本文引用的文献

人工智能能否通过欧洲神经外科书面考试？——伦理与实际问题。

Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.

作者信息

机构信息

出版信息

INTRODUCTION

RESEARCH QUESTION

MATERIAL AND METHODS

RESULTS

DISCUSSION AND CONCLUSION

引言

研究问题

材料与方法

结果

讨论与结论

相似文献

引用本文的文献

本文引用的文献