Megafu Michael, Guerrero Omar, Hasan Rafay, Hunt Larry, Langhelm Devri, Le Benning, Li Xinning, Kelly Robert, Parisien Robert L, Cusano Antonio
Department of Orthopaedic Surgery, University of Connecticut, Farmington, CT, USA.
A.T. Still University School of Osteopathic Medicine in Arizona, Mesa, AZ, USA.
JSES Int. 2025 Apr 10;9(4):1365-1370. doi: 10.1016/j.jseint.2025.03.011. eCollection 2025 Jul.
Integrating machine learning and artificial intelligence (AI) technologies has revolutionized various sectors, including health care. However, their application in orthopedic health-care settings still needs to be improved. This study sought to evaluate Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini's capacity to make quality medical recommendations regarding glenohumeral osteoarthritis, weighing them against the recommendations established in the Evidence-Based Clinical Practice Guidelines (CPGs) of the American Academy of Orthopaedic Surgeons (AAOS).
The 2020 AAOS CPGs, a widely recognized and respected source, were the basis for determining recommended and nonrecommended treatments in this study. ChatGPT and Gemini were queried on 20 treatments based on these guidelines; 10 were recommended for managing glenohumeral joint osteoarthritis, five were not recommended for managing glenohumeral joint osteoarthritis, and five were reported as consensus statements. These responses were categorized as "Concordance" or "No Concordance" with the AAOS CPGs. A Cohen's Kappa coefficient was calculated to assess the interrater reliability.
Among the 20 treatments examined, ChatGPT and Gemini showed concordance with the AAOS CPGs for 10 (100%) and 5 (50%) treatments, respectively. On the other hand, for treatments that AAOS CPGs did not recommend, ChatGPT had concordance for four out of the five treatments (80%), while Gemini had 100% concordance. The Cohen's Kappa coefficient to assess interrater reliability was found to be 0.90, indicating a very high level of agreement between the two raters in categorizing responses as "Concordance" or "No Concordance" with the AAOS CPGs.
The study findings reveal that ChatGPT and Gemini cannot solely recommend CPGs as outlined in AAOS CPGs. As patients increasingly utilize external resources such as AI platforms and the Internet for medical recommendations, providers should advise patients to exercise caution when seeking medical advice from these AI platforms for managing glenohumeral joint osteoarthritis.
机器学习和人工智能(AI)技术的整合已经彻底改变了包括医疗保健在内的各个领域。然而,它们在骨科医疗环境中的应用仍有待改进。本研究旨在评估聊天生成预训练变换器(ChatGPT)和Gemini就盂肱关节骨关节炎给出高质量医疗建议的能力,并将其与美国骨科医师学会(AAOS)循证临床实践指南(CPG)中确立的建议进行权衡比较。
2020年AAOS CPG是本研究中确定推荐和不推荐治疗方法的广泛认可和受尊重的依据。基于这些指南,对ChatGPT和Gemini询问了20种治疗方法;其中10种被推荐用于管理盂肱关节骨关节炎,5种不被推荐用于管理盂肱关节骨关节炎,5种被报告为共识声明。这些回答被分类为与AAOS CPG的“一致”或“不一致”。计算Cohen's Kappa系数以评估评分者间的可靠性。
在所检查的20种治疗方法中,ChatGPT和Gemini分别与AAOS CPG在10种(100%)和5种(50%)治疗方法上表现出一致性。另一方面,对于AAOS CPG不推荐的治疗方法,ChatGPT在五种治疗方法中的四种(80%)上表现出一致性,而Gemini则为100%。发现评估评分者间可靠性的Cohen's Kappa系数为0.90,表明两位评分者在将回答分类为与AAOS CPG的“一致”或“不一致”方面具有非常高的一致性水平。
研究结果表明,ChatGPT和Gemini不能像AAOS CPG中概述的那样单独推荐CPG。随着患者越来越多地利用AI平台和互联网等外部资源获取医疗建议,医疗服务提供者应建议患者在从这些AI平台寻求盂肱关节骨关节炎管理的医疗建议时谨慎行事。