Yaş Semih, Yapar Dilek, Yapar Aliekber, Özel Tayfun, Tokgöz Mehmet Ali, Baymurat Alim Can, Şenköylü Alpaslan
Department of Orthopaedics and Traumatology, Turkish Ministry of Health, Dr. Abdurrahman Yurtaslan Ankara Oncology Training and Research Hospital, Ankara, Türkiye.
Department of Public Health, Turkish Ministry of Health, Muratpasa District Health Directorate, Antalya, Türkiye.
Acta Orthop Traumatol Turc. 2025 Jul 18;59(4):222-229. doi: 10.5152/j.aott.2025.25279.
Objective: To evaluate the accuracy, applicability, comprehensiveness, and communication quality of responses generated by ChatGPT and Google Gemini in adolescent idiopathic scoliosis (AIS)-related scenarios, with the aim of assessing their potential utility as tools in patient management. Methods: Six case-based questions reflecting common patient concerns related to adolescent idiopathic scoliosis were developed by orthopedic specialists. Responses generated by ChatGPT and Google Gemini were independently evaluated by 61 orthopedic surgeons using a standardized rubric assessing accuracy, applicability, comprehensiveness, and communication clarity, each rated on a 1-5 Likert scale. Comparative analyses between platforms were performed using the Mann-Whitney U and Wilcoxon signed-rank tests. Additionally, open-ended feedback was collected to explore participants' perspectives on the potential and limitations of AI-based consultations. Results: ChatGPT outperformed Google Gemini in terms of accuracy (P = .013) in postoperative care scenarios. The results for applicability (P = .119), comprehensiveness (P = .619), and communication (P = .240) were not statistically significant. Orthopedic specialists rated both AI models significantly higher than residents in accuracy, applicability, and comprehensiveness. Most evaluators acknowledged the potential of AI to reduce physician workload and support patient guidance; however, concerns were raised regarding reliability, ethical implications, and the current limitations of AI in ensuring patient safety. Conclusion: ChatGPT and Google Gemini demonstrated moderate accuracy and communication quality in adolescent idiopathic scoliosis-related scenarios, with ChatGPT showing a modest advantage. Although both models show promising results as supportive tools for patient education and preliminary consultations, their current limitations in accuracy and comprehensiveness restrict their clinical reliability. Multidisciplinary collaboration is crucial to ensure e!ective applications of AI in orthopedic practice. Level of Evidence: Level III, Diagnostic Study.
评估ChatGPT和谷歌Gemini在青少年特发性脊柱侧弯(AIS)相关场景中生成的回答的准确性、适用性、全面性和沟通质量,旨在评估它们作为患者管理工具的潜在效用。方法:骨科专家提出了六个基于病例的问题,反映了与青少年特发性脊柱侧弯相关的常见患者担忧。61位骨科医生使用标准化评分标准对ChatGPT和谷歌Gemini生成的回答进行独立评估,该评分标准评估准确性、适用性、全面性和沟通清晰度,每项均按1 - 5李克特量表评分。使用曼-惠特尼U检验和威尔科克森符号秩检验对平台之间进行比较分析。此外,收集了开放式反馈,以探讨参与者对基于人工智能的咨询的潜力和局限性的看法。结果:在术后护理场景中,ChatGPT在准确性方面(P = 0.013)优于谷歌Gemini。适用性(P = 0.119)、全面性(P = 0.619)和沟通(P = 0.240)的结果无统计学意义。骨科专家对两种人工智能模型在准确性、适用性和全面性方面的评分均显著高于住院医生。大多数评估者承认人工智能有潜力减轻医生工作量并支持患者指导;然而,人们对可靠性、伦理影响以及人工智能在确保患者安全方面的当前局限性表示担忧。结论:ChatGPT和谷歌Gemini在青少年特发性脊柱侧弯相关场景中表现出中等的准确性和沟通质量,ChatGPT显示出适度优势。尽管这两种模型作为患者教育和初步咨询的支持工具都显示出有前景的结果,但它们目前在准确性和全面性方面的局限性限制了其临床可靠性。多学科合作对于确保人工智能在骨科实践中的有效应用至关重要。证据水平:III级,诊断性研究。