Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.
Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [. 2024;47(2):e85-e89.].
人工智能和机器学习模型的进步,如 Chat Generative Pre-trained Transformer(ChatGPT),发展速度非常快。OpenAI 于 2023 年 3 月发布了其最新的 ChatGPT 模型 GPT-4,它提供了广泛的医疗应用。该模型在许多医学委员会考试中表现出了显著的能力。本研究旨在评估 GPT-4 在骨科住院医师培训考试(OITE)中的表现,该考试用于为美国骨科委员会(ABOS)第一部分考试做准备。此外,还将 GPT-4 的表现与之前的迭代模型 ChatGPT、GPT-3.5 的数据进行了比较,GPT-3.5 是在 GPT-4 发布前 4 个月发布的。GPT-4 正确回答了 396 个尝试问题中的 251 个(63.4%),而 GPT-3.5 正确回答了 410 个尝试问题中的 46.3%。GPT-4 在骨科委员会风格问题上的准确性明显高于 GPT-3.5(<.00001)。GPT-4 的表现与平均第三年骨科住院医师的表现最为相似,而 GPT-3.5 的表现低于平均骨科实习医生。GPT-4 的整体准确性略低于表明可能通过 ABOS 第一部分考试的大致阈值。我们的结果表明,OpenAI 的最新模型 GPT-4 取得了重大进展。未来的研究应该评估人工智能模型在更大的数据集上继续训练并提供更多功能的潜在临床应用。[2024;47(2):e85-e89。]。
Int J Dermatol. 2024-1
Hand Surg Rehabil. 2024-6
J Clin Med. 2025-8-20
Ann Med Surg (Lond). 2025-6-30
Sisli Etfal Hastan Tip Bul. 2025-2-7
Acta Orthop Traumatol Turc. 2025-7-18
Acta Orthop Traumatol Turc. 2025-3-17
Arthrosc Sports Med Rehabil. 2024-10-10