Nieves-Lopez Benjamin, Wing Clayton, Springer Bryan D, Aziz Keith T
University of Puerto Rico, Medical Sciences Campus, San Juan, Puerto Rico.
Department of Orthopedic Surgery, Mayo Clinic Florida, Jacksonville, FL.
Arthroplast Today. 2025 Jul 14;34:101772. doi: 10.1016/j.artd.2025.101772. eCollection 2025 Aug.
Chat Generative Pre-trained Transformer (ChatGPT) is a language model designed to conduct conversations utilizing extensive data from the internet. Despite its potential, the utility of ChatGPT in orthopaedic surgery, particularly in arthroplasty, is still being investigated. This study assesses ChatGPT's performance on arthroplasty-related questions in comparison to an Adult Reconstruction Fellow and a Senior level attending.
A total of 299 questions from the Adult Reconstruction self-assessment on OrthoBullets were evaluated using ChatGPT 4. Performance was analyzed across different question categories and compared with the performance of an Adult Reconstruction Fellow and Senior level attending arthroplasty surgeon with a -square test. Further comparisons were performed to assess ChatGPT's accuracy rate on image-based questions. Statistical significance was set to a value ≤ .05.
ChatGPT achieved a 66.9% accuracy rate compared to 84.3% and 85.3% obtained by the Fellow and Attending, respectively. No significant differences in performance were observed across question categories. ChatGPT demonstrated better results in text-only compared to image-based questions. Although not statistically significant, ChatGPT showed the highest accuracy rate in questions that included both an X-ray and a clinical picture.
ChatGPT performed inferior to an Adult Reconstruction Fellow and Attending and it provided more accurate answers when prompted with text-only questions. These findings suggest that while ChatGPT can serve as a useful supplementary resource for arthroplasty topics, it cannot substitute for the clinical judgment required in detailed assessments. Further research is necessary to optimize and validate the use of artificial intelligence in medical education and patient care.
聊天生成预训练变换器(ChatGPT)是一种语言模型,旨在利用来自互联网的大量数据进行对话。尽管具有潜力,但ChatGPT在骨科手术,尤其是关节置换术中的效用仍在研究中。本研究评估了ChatGPT在关节置换相关问题上的表现,并与成人重建研究员和高级主治医师进行了比较。
使用ChatGPT 4对来自OrthoBullets成人重建自我评估的总共299个问题进行了评估。分析了不同问题类别的表现,并通过卡方检验与成人重建研究员和高级关节置换外科主治医师的表现进行了比较。还进行了进一步比较,以评估ChatGPT在基于图像问题上的准确率。统计学显著性设定为p值≤0.05。
ChatGPT的准确率为66.9%,而研究员和主治医师的准确率分别为84.3%和85.3%。在不同问题类别中未观察到表现上的显著差异。与基于图像的问题相比,ChatGPT在纯文本问题上表现更好。尽管无统计学显著性,但ChatGPT在包含X线片和临床图片的问题中显示出最高的准确率。
ChatGPT的表现不如成人重建研究员和主治医师,并且在仅以文本提问时能提供更准确的答案。这些发现表明,虽然ChatGPT可以作为关节置换术主题的有用补充资源,但它不能替代详细评估中所需的临床判断。有必要进行进一步研究,以优化和验证人工智能在医学教育和患者护理中的应用。