Suppr超能文献

评估ChatGPT在骨科住院医师培训考试中的表现。

Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination.

作者信息

Kung Justin E, Marshall Christopher, Gauthier Chase, Gonzalez Tyler A, Jackson J Benjamin

机构信息

Department of Orthopedic Surgery, Prisma Health-Midlands University of South Carolina, Columbia, South Carolina.

University of South Carolina School of Medicine, Columbia, South Carolina.

出版信息

JB JS Open Access. 2023 Sep 8;8(3). doi: 10.2106/JBJS.OA.23.00056. eCollection 2023 Jul-Sep.

Abstract

BACKGROUND

Artificial intelligence (AI) holds potential in improving medical education and healthcare delivery. ChatGPT is a state-of-the-art natural language processing AI model which has shown impressive capabilities, scoring in the top percentiles on numerous standardized examinations, including the Uniform Bar Exam and Scholastic Aptitude Test. The goal of this study was to evaluate ChatGPT performance on the Orthopaedic In-Training Examination (OITE), an assessment of medical knowledge for orthopedic residents.

METHODS

OITE 2020, 2021, and 2022 questions without images were inputted into ChatGPT version 3.5 and version 4 (GPT-4) with zero prompting. The performance of ChatGPT was evaluated as a percentage of correct responses and compared with the national average of orthopedic surgery residents at each postgraduate year (PGY) level. ChatGPT was asked to provide a source for its answer, which was categorized as being a journal article, book, or website, and if the source could be verified. Impact factor for the journal cited was also recorded.

RESULTS

ChatGPT answered 196 of 360 answers correctly (54.3%), corresponding to a PGY-1 level. ChatGPT cited a verifiable source in 47.2% of questions, with an average median journal impact factor of 5.4. GPT-4 answered 265 of 360 questions correctly (73.6%), corresponding to the average performance of a PGY-5 and exceeding the corresponding passing score for the American Board of Orthopaedic Surgery Part I Examination of 67%. GPT-4 cited a verifiable source in 87.9% of questions, with an average median journal impact factor of 5.2.

CONCLUSIONS

ChatGPT performed above the average PGY-1 level and GPT-4 performed better than the average PGY-5 level, showing major improvement. Further investigation is needed to determine how successive versions of ChatGPT would perform and how to optimize this technology to improve medical education.

CLINICAL RELEVANCE

AI has the potential to aid in medical education and healthcare delivery.

摘要

背景

人工智能(AI)在改善医学教育和医疗服务方面具有潜力。ChatGPT是一种先进的自然语言处理人工智能模型,已展现出令人印象深刻的能力,在众多标准化考试中得分位居前百分位,包括统一律师考试和学术能力评估测试。本研究的目的是评估ChatGPT在骨科住院医师培训考试(OITE)中的表现,这是一项针对骨科住院医师医学知识的评估。

方法

将2020年、2021年和2022年无图像的OITE问题输入ChatGPT 3.5版本和版本4(GPT - 4),无任何提示。ChatGPT的表现以正确回答的百分比来评估,并与各研究生年级(PGY)水平的骨科手术住院医师全国平均水平进行比较。要求ChatGPT为其答案提供来源,该来源被归类为期刊文章、书籍或网站,以及该来源是否可核实。还记录了所引用期刊的影响因子。

结果

ChatGPT在360个答案中正确回答了196个(54.3%),相当于PGY - 1水平。ChatGPT在47.2%的问题中引用了可核实的来源,所引用期刊的平均影响因子中位数为5.4。GPT - 4在360个问题中正确回答了265个(73.6%),相当于PGY - 5的平均表现,超过了美国骨科医师委员会第一部分考试67%的相应及格分数。GPT - 4在87.9%的问题中引用了可核实的来源,所引用期刊的平均影响因子中位数为5.2。

结论

ChatGPT的表现高于PGY - 1平均水平,GPT - 4表现优于PGY - 5平均水平,显示出重大进步。需要进一步研究以确定ChatGPT的后续版本表现如何,以及如何优化这项技术以改善医学教育。

临床意义

人工智能有潜力辅助医学教育和医疗服务。

相似文献

1
Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination.评估ChatGPT在骨科住院医师培训考试中的表现。
JB JS Open Access. 2023 Sep 8;8(3). doi: 10.2106/JBJS.OA.23.00056. eCollection 2023 Jul-Sep.

引用本文的文献

本文引用的文献

1
GPT-4 passes the bar exam.GPT-4通过了律师资格考试。
Philos Trans A Math Phys Eng Sci. 2024 Apr 15;382(2270):20230254. doi: 10.1098/rsta.2023.0254. Epub 2024 Feb 26.
9
ChatGPT and Other Large Language Models Are Double-edged Swords.ChatGPT和其他大型语言模型是双刃剑。
Radiology. 2023 Apr;307(2):e230163. doi: 10.1148/radiol.230163. Epub 2023 Jan 26.
10
Academic Radiology Departments Should Lead Artificial Intelligence Initiatives.学术放射科应引领人工智能计划。
Acad Radiol. 2023 May;30(5):971-974. doi: 10.1016/j.acra.2022.07.011. Epub 2022 Aug 11.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验