Orthopedics. 2024 May-Jun;47(3):e146-e150. doi: 10.3928/01477447-20240304-02. Epub 2024 Mar 12.
Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.
Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.
ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (<.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).
There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [. 2024;47(3):e146-e150.].
人工智能(AI)生成型大语言模型是功能强大且日益普及的工具,在医疗保健教育和培训领域具有潜在应用。年度骨科住院医师培训考试(OITE)广泛用于评估住院医师的学术进展和为美国骨科委员会第 1 部分考试做准备。
Open AI 的 ChatGPT 和 Google 的 Bard 生成语言模型参加了 2022 年的 OITE。包含图像的问题题干被输入,然后输入关于影像学发现的基于文本的描述。
ChatGPT 正确回答了 69.1%的问题。当提供描述伴随媒体的文本时,正确回答率增加到 77.8%。相比之下,Bard 正确回答了 49.8%的问题。当提供问题题干中影像学描述的文本时,正确回答率增加到 58%(<.0001)。ChatGPT 在肩部类问题中准确率最高,正确回答率为 90.9%。Bard 在运动类问题中表现最好,正确回答率为 65.4%。ChatGPT 的表现优于已公布的骨科住院医师考试(66%)。
在 OITE 上,现有的 AI 模型的准确率存在显著差异。AI 生成语言软件在未来的骨科教育中可能具有多种潜在作用,包括模拟患者表现和临床场景、定制个性化学习计划以及推动基于证据的病例讨论。需要在骨科社区内进行进一步的研究和合作,以安全采用这些工具并降低与使用相关的风险。[2024;47(3):e146-e150。]。