Kung Justin E, Marshall Christopher, Gauthier Chase, Gonzalez Tyler A, Jackson J Benjamin
Department of Orthopedic Surgery, Prisma Health-Midlands University of South Carolina, Columbia, South Carolina.
University of South Carolina School of Medicine, Columbia, South Carolina.
JB JS Open Access. 2023 Sep 8;8(3). doi: 10.2106/JBJS.OA.23.00056. eCollection 2023 Jul-Sep.
Artificial intelligence (AI) holds potential in improving medical education and healthcare delivery. ChatGPT is a state-of-the-art natural language processing AI model which has shown impressive capabilities, scoring in the top percentiles on numerous standardized examinations, including the Uniform Bar Exam and Scholastic Aptitude Test. The goal of this study was to evaluate ChatGPT performance on the Orthopaedic In-Training Examination (OITE), an assessment of medical knowledge for orthopedic residents.
OITE 2020, 2021, and 2022 questions without images were inputted into ChatGPT version 3.5 and version 4 (GPT-4) with zero prompting. The performance of ChatGPT was evaluated as a percentage of correct responses and compared with the national average of orthopedic surgery residents at each postgraduate year (PGY) level. ChatGPT was asked to provide a source for its answer, which was categorized as being a journal article, book, or website, and if the source could be verified. Impact factor for the journal cited was also recorded.
ChatGPT answered 196 of 360 answers correctly (54.3%), corresponding to a PGY-1 level. ChatGPT cited a verifiable source in 47.2% of questions, with an average median journal impact factor of 5.4. GPT-4 answered 265 of 360 questions correctly (73.6%), corresponding to the average performance of a PGY-5 and exceeding the corresponding passing score for the American Board of Orthopaedic Surgery Part I Examination of 67%. GPT-4 cited a verifiable source in 87.9% of questions, with an average median journal impact factor of 5.2.
ChatGPT performed above the average PGY-1 level and GPT-4 performed better than the average PGY-5 level, showing major improvement. Further investigation is needed to determine how successive versions of ChatGPT would perform and how to optimize this technology to improve medical education.
AI has the potential to aid in medical education and healthcare delivery.
人工智能(AI)在改善医学教育和医疗服务方面具有潜力。ChatGPT是一种先进的自然语言处理人工智能模型,已展现出令人印象深刻的能力,在众多标准化考试中得分位居前百分位,包括统一律师考试和学术能力评估测试。本研究的目的是评估ChatGPT在骨科住院医师培训考试(OITE)中的表现,这是一项针对骨科住院医师医学知识的评估。
将2020年、2021年和2022年无图像的OITE问题输入ChatGPT 3.5版本和版本4(GPT - 4),无任何提示。ChatGPT的表现以正确回答的百分比来评估,并与各研究生年级(PGY)水平的骨科手术住院医师全国平均水平进行比较。要求ChatGPT为其答案提供来源,该来源被归类为期刊文章、书籍或网站,以及该来源是否可核实。还记录了所引用期刊的影响因子。
ChatGPT在360个答案中正确回答了196个(54.3%),相当于PGY - 1水平。ChatGPT在47.2%的问题中引用了可核实的来源,所引用期刊的平均影响因子中位数为5.4。GPT - 4在360个问题中正确回答了265个(73.6%),相当于PGY - 5的平均表现,超过了美国骨科医师委员会第一部分考试67%的相应及格分数。GPT - 4在87.9%的问题中引用了可核实的来源,所引用期刊的平均影响因子中位数为5.2。
ChatGPT的表现高于PGY - 1平均水平,GPT - 4表现优于PGY - 5平均水平,显示出重大进步。需要进一步研究以确定ChatGPT的后续版本表现如何,以及如何优化这项技术以改善医学教育。
人工智能有潜力辅助医学教育和医疗服务。