Suppr超能文献

人工智能在骨科领域的应用:ChatGPT 在 AAOS 骨科住院医师培训考试(OITE)全题文本和图像问题上的表现。

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

机构信息

Department of Orthopaedic Surgery, Geisinger Commonwealth School of Medicine, Geisinger Musculoskeletal Institute, Danville, PA.

Department of Orthopaedic Surgery, Geisinger Commonwealth School of Medicine, Geisinger Musculoskeletal Institute, Danville, PA.

出版信息

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

Abstract

OBJECTIVE

Artificial intelligence (AI) is capable of answering complex medical examination questions, offering the potential to revolutionize medical education and healthcare delivery. In this study we aimed to assess ChatGPT, a model that has demonstrated exceptional performance on standardized exams. Specifically, our focus was on evaluating ChatGPT's performance on the complete 2019 Orthopaedic In-Training Examination (OITE), including questions with an image component. Furthermore, we explored difference in performance when questions varied by text only or text with an associated image, including whether the image was described using AI or a trained orthopaedist.

DESIGN AND SETTING

Questions from the 2019 OITE were input into ChatGPT version 4.0 (GPT-4) using 3 response variants. As the capacity to input or interpret images is not publicly available in ChatGPT at the time of this study, questions with an image component were described and added to the OITE question using descriptions generated by Microsoft Azure AI Vision Studio or authors of the study.

RESULTS

ChatGPT performed equally on OITE questions with or without imaging components, with an average correct answer choice of 49% and 48% across all 3 input methods. Performance dropped by 6% when using image descriptions generated by AI. When using single answer multiple-choice input methods, ChatGPT performed nearly double the rate of random guessing, answering 49% of questions correctly. The performance of ChatGPT was worse than all resident classes on the 2019 exam, scoring 4% lower than PGY-1 residents.

DISCUSSION

ChatGT performed below all resident classes on the 2019 OITE. Performance on text only questions and questions with images was nearly equal if the image was described by a trained orthopaedic specialist but decreased when using an AI generated description. Recognizing the performance abilities of AI software may provide insight into the current and future applications of this technology into medical education.

摘要

目的

人工智能(AI)能够回答复杂的医学检查问题,有可能彻底改变医学教育和医疗保健的提供方式。在这项研究中,我们旨在评估 ChatGPT,这是一种在标准化考试中表现出色的模型。具体来说,我们的重点是评估 ChatGPT 在完整的 2019 年骨科住院医师培训考试(OITE)中的表现,包括带有图像组件的问题。此外,我们探讨了当问题仅以文本或带有相关图像的文本呈现时,性能的差异,包括图像是使用 AI 还是经过培训的骨科医生描述的。

设计和设置

将 2019 年 OITE 的问题输入到 ChatGPT 版本 4.0(GPT-4)中,使用 3 种响应变体。由于在本研究时,ChatGPT 没有公开输入或解释图像的能力,因此带有图像组件的问题使用由 Microsoft Azure AI Vision Studio 或研究作者生成的描述添加到 OITE 问题中。

结果

ChatGPT 在带有或不带有成像组件的 OITE 问题上表现相同,在所有 3 种输入方法中,平均正确答案选择率为 49%和 48%。使用 AI 生成的图像描述时,性能下降了 6%。当使用单答案多项选择输入方法时,ChatGPT 的回答正确的问题率几乎是随机猜测的两倍,正确回答了 49%的问题。ChatGPT 在 2019 年考试中的表现低于所有住院医师类别,比 PGY-1 住院医师得分低 4%。

讨论

ChatGPT 在 2019 年 OITE 上的表现低于所有住院医师类别。如果图像由经过培训的骨科专家描述,那么仅文本问题和带有图像的问题的表现几乎相等,但如果使用 AI 生成的描述,则性能会下降。认识到 AI 软件的性能能力可能会深入了解这项技术在医学教育中的当前和未来应用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验