Yüce Ali, Yerli Mustafa, Misir Abdulhamit
Department of Orthopedic and Traumatology, Prof. Dr. Cemil Taşcıoğlu City Hospital, Istanbul, Turkey.
Department of Orthopedic and Traumatology, Prof. Dr. Cemil Taşcıoğlu City Hospital, Istanbul, Turkey.
J Shoulder Elbow Surg. 2025 Jan;34(1):141-146. doi: 10.1016/j.jse.2024.04.021. Epub 2024 Jun 7.
BACKGROUND: Patients and healthcare professionals extensively rely on the internet for medical information. Low-quality videos can significantly impact the patient-doctor relationship, potentially affecting consultation efficiency and decision-making process. Chat Generative Pre-Trained Transformer (ChatGPT) is an artificial intelligence application with the potential to improve medical reports, provide medical information, and supplement orthopedic knowledge acquisition. This study aimed to assess the ability of ChatGPT-4 to detect deficiencies in these videos, assuming it would be successful in identifying such deficiencies. MATERIALS AND METHODS: YouTube was searched for "rotator cuff surgery" and "rotator cuff surgery clinic" videos. A total of 90 videos were evaluated, with 40 included in the study after exclusions. Using the Google Chrome extension ''YouTube Summary with ChatGPT & Claude,'' transcripts of these videos were accessed. Two senior orthopedic surgeons and ChatGPT-4 evaluated the videos using the rotator cuff surgery YouTube score (RCSS) system and DISCERN criteria. RESULTS: ChatGPT-4's RCSS evaluations were comparable to those of the observers in 25% of instances, and 40% for DISCERN. The interobserver agreement between human observers and ChatGPT-4 was fair (AC1: 0.575 for DISCERN and AC1: 0.516 for RCSS). Even after correcting ChatGPT-4's incorrect answers, the agreement did not change significantly. ChatGPT-4 tended to give higher scores than the observers, particularly in sections related to anatomy, surgical technique, and indications for surgery. CONCLUSION: The use of ChatGPT-4 as an observer in evaluating rotator cuff surgery-related videos and identifying deficiencies is not currently recommended. Future studies with trained ChatGPT models may address these deficiencies and enable ChatGPT to evaluate videos at a human observer level.
背景:患者和医疗保健专业人员广泛依赖互联网获取医学信息。低质量的视频会严重影响医患关系,可能影响会诊效率和决策过程。聊天生成预训练变换器(ChatGPT)是一种人工智能应用程序,有潜力改进医学报告、提供医学信息并辅助骨科知识获取。本研究旨在评估ChatGPT-4检测这些视频中缺陷的能力,假设它能成功识别此类缺陷。 材料与方法:在YouTube上搜索“肩袖手术”和“肩袖手术诊所”视频。共评估了90个视频,排除后40个纳入研究。使用谷歌浏览器扩展程序“带有ChatGPT和Claude的YouTube摘要”获取这些视频的文字记录。两名资深骨科医生和ChatGPT-4使用肩袖手术YouTube评分(RCSS)系统和DISCERN标准对视频进行评估。 结果:ChatGPT-4的RCSS评估在25%的情况下与观察者相当,DISCERN评估为40%。人类观察者与ChatGPT-4之间的观察者间一致性为中等(DISCERN的AC1为0.575,RCSS的AC1为0.516)。即使纠正了ChatGPT-4的错误答案,一致性也没有显著变化。ChatGPT-4倾向于给出比观察者更高的分数,特别是在与解剖学、手术技术和手术适应症相关的部分。 结论:目前不建议使用ChatGPT-4作为观察者来评估肩袖手术相关视频并识别缺陷。未来对经过训练的ChatGPT模型进行的研究可能会解决这些缺陷,并使ChatGPT能够以人类观察者的水平评估视频。
Orthop Traumatol Surg Res. 2020-2
Clin Shoulder Elb. 2022-9
Clin Genitourin Cancer. 2024-10
Acta Chir Orthop Traumatol Cech. 2023
J Exp Orthop. 2024-9-17