ChatGPT、Bard和必应聊天是大型语言处理模型，它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。

ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.

作者信息

Guerra Gage A, Hofmann Hayden L, Le Jonathan L, Wong Alexander M, Fathi Amir, Mayfield Cory K, Petrigliano Frank A, Liu Joseph N

机构信息

USC Epstein Family Center for Sports Medicine, Keck Medicine of USC, Los Angeles, California, U.S.A..

USC Epstein Family Center for Sports Medicine, Keck Medicine of USC, Los Angeles, California, U.S.A.

出版信息

Arthroscopy. 2025 Mar;41(3):557-562. doi: 10.1016/j.arthro.2024.08.023. Epub 2024 Aug 28.

DOI:10.1016/j.arthro.2024.08.023

PMID:39209078

Abstract

PURPOSE

To assess ChatGPT's, Bard's, and Bing Chat's ability to generate accurate orthopaedic diagnoses or corresponding treatments by comparing their performance on the Orthopaedic In-Training Examination (OITE) with that of orthopaedic trainees.

METHODS

OITE question sets from 2021 and 2022 were compiled to form a large set of 420 questions. ChatGPT (GPT-3.5), Bard, and Bing Chat were instructed to select one of the provided responses to each question. The accuracy of composite questions was recorded and comparatively analyzed to human cohorts including medical students and orthopaedic residents, stratified by postgraduate year (PGY).

RESULTS

ChatGPT correctly answered 46.3% of composite questions whereas Bing Chat correctly answered 52.4% of questions and Bard correctly answered 51.4% of questions on the OITE. When image-associated questions were excluded, ChatGPT's, Bing Chat's, and Bard's overall accuracies improved to 49.1%, 53.5%, and 56.8%, respectively. Medical students correctly answered 30.8%, and PGY-1, -2, -3, -4, and -5 orthopaedic residents correctly answered 53.1%, 60.4%, 66.6%, 70.0%, and 71.9%, respectively.

CONCLUSIONS

ChatGPT, Bard, and Bing Chat are artificial intelligence (AI) models that answered OITE questions with accuracy similar to that of first-year orthopaedic surgery residents. ChatGPT, Bard, and Bing Chat achieved this result without using images or other supplementary media that human test takers are provided.

CLINICAL RELEVANCE

Our comparative performance analysis of AI models on orthopaedic board-style questions highlights ChatGPT's, Bing Chat's, and Bard's clinical knowledge and proficiency. Our analysis establishes a baseline of AI model proficiency in the field of orthopaedics and provides a comparative marker for future, more advanced deep learning models. Although in its elementary phase, future AI models' orthopaedic knowledge may provide clinical support and serve as an educational tool.

摘要

目的

通过比较ChatGPT、Bard和必应聊天（Bing Chat）在骨科住院医师培训考试（OITE）中的表现与骨科住院医师的表现，评估它们生成准确骨科诊断或相应治疗方法的能力。

方法

收集2021年和2022年的OITE试题集，形成一套包含420个问题的大型题库。指导ChatGPT（GPT - 3.5）、Bard和必应聊天从每个问题提供的答案中选择一个。记录复合问题的准确率，并与包括医学生和骨科住院医师在内的人类群体进行比较分析，按研究生年级（PGY）分层。

结果

在OITE中，ChatGPT正确回答了46.3%的复合问题，必应聊天正确回答了52.4%的问题，Bard正确回答了51.4%的问题。排除与图像相关的问题后，ChatGPT、必应聊天和Bard的总体准确率分别提高到49.1%、53.5%和56.8%。医学生正确回答了30.8%，PGY - 1、- 2、- 3、- 4和- 5年级的骨科住院医师正确回答的比例分别为53.1%、60.4%、66.6%、70.0%和71.9%。

结论

ChatGPT、Bard和必应聊天是人工智能（AI）模型，其回答OITE问题的准确率与骨科手术一年级住院医师相似。ChatGPT、Bard和必应聊天在不使用人类考生所使用的图像或其他辅助媒介的情况下取得了这一结果。

临床意义

我们对AI模型在骨科板题型问题上的比较性能分析突出了ChatGPT、必应聊天和Bard的临床知识和熟练程度。我们的分析建立了AI模型在骨科领域熟练程度的基线，并为未来更先进的深度学习模型提供了一个比较指标。尽管处于初级阶段，但未来AI模型的骨科知识可能会提供临床支持并作为一种教育工具。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ChatGPT、Bard和必应聊天是大型语言处理模型，它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。

ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

CLINICAL RELEVANCE

目的

方法

结果

结论

临床意义

相似文献

引用本文的文献

ChatGPT、Bard和必应聊天是大型语言处理模型，它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。

ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

CLINICAL RELEVANCE

目的

方法

结果

结论

临床意义

相似文献

引用本文的文献