评估大语言模型在青少年特发性脊柱侧弯护理中的作用：ChatGPT与谷歌Gemini的比较

Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.

作者信息

Yaş Semih, Yapar Dilek, Yapar Aliekber, Özel Tayfun, Tokgöz Mehmet Ali, Baymurat Alim Can, Şenköylü Alpaslan

机构信息

Department of Orthopaedics and Traumatology, Turkish Ministry of Health, Dr. Abdurrahman Yurtaslan Ankara Oncology Training and Research Hospital, Ankara, Türkiye.

Department of Public Health, Turkish Ministry of Health, Muratpasa District Health Directorate, Antalya, Türkiye.

出版信息

Acta Orthop Traumatol Turc. 2025 Jul 18;59(4):222-229. doi: 10.5152/j.aott.2025.25279.

DOI:10.5152/j.aott.2025.25279

PMID:40728044

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12362497/

Abstract

Objective: To evaluate the accuracy, applicability, comprehensiveness, and communication quality of responses generated by ChatGPT and Google Gemini in adolescent idiopathic scoliosis (AIS)-related scenarios, with the aim of assessing their potential utility as tools in patient management. Methods: Six case-based questions reflecting common patient concerns related to adolescent idiopathic scoliosis were developed by orthopedic specialists. Responses generated by ChatGPT and Google Gemini were independently evaluated by 61 orthopedic surgeons using a standardized rubric assessing accuracy, applicability, comprehensiveness, and communication clarity, each rated on a 1-5 Likert scale. Comparative analyses between platforms were performed using the Mann-Whitney U and Wilcoxon signed-rank tests. Additionally, open-ended feedback was collected to explore participants' perspectives on the potential and limitations of AI-based consultations. Results: ChatGPT outperformed Google Gemini in terms of accuracy (P = .013) in postoperative care scenarios. The results for applicability (P = .119), comprehensiveness (P = .619), and communication (P = .240) were not statistically significant. Orthopedic specialists rated both AI models significantly higher than residents in accuracy, applicability, and comprehensiveness. Most evaluators acknowledged the potential of AI to reduce physician workload and support patient guidance; however, concerns were raised regarding reliability, ethical implications, and the current limitations of AI in ensuring patient safety. Conclusion: ChatGPT and Google Gemini demonstrated moderate accuracy and communication quality in adolescent idiopathic scoliosis-related scenarios, with ChatGPT showing a modest advantage. Although both models show promising results as supportive tools for patient education and preliminary consultations, their current limitations in accuracy and comprehensiveness restrict their clinical reliability. Multidisciplinary collaboration is crucial to ensure e!ective applications of AI in orthopedic practice. Level of Evidence: Level III, Diagnostic Study.

摘要

目的

评估ChatGPT和谷歌Gemini在青少年特发性脊柱侧弯（AIS）相关场景中生成的回答的准确性、适用性、全面性和沟通质量，旨在评估它们作为患者管理工具的潜在效用。方法：骨科专家提出了六个基于病例的问题，反映了与青少年特发性脊柱侧弯相关的常见患者担忧。61位骨科医生使用标准化评分标准对ChatGPT和谷歌Gemini生成的回答进行独立评估，该评分标准评估准确性、适用性、全面性和沟通清晰度，每项均按1 - 5李克特量表评分。使用曼-惠特尼U检验和威尔科克森符号秩检验对平台之间进行比较分析。此外，收集了开放式反馈，以探讨参与者对基于人工智能的咨询的潜力和局限性的看法。结果：在术后护理场景中，ChatGPT在准确性方面（P = 0.013）优于谷歌Gemini。适用性（P = 0.119）、全面性（P = 0.619）和沟通（P = 0.240）的结果无统计学意义。骨科专家对两种人工智能模型在准确性、适用性和全面性方面的评分均显著高于住院医生。大多数评估者承认人工智能有潜力减轻医生工作量并支持患者指导；然而，人们对可靠性、伦理影响以及人工智能在确保患者安全方面的当前局限性表示担忧。结论：ChatGPT和谷歌Gemini在青少年特发性脊柱侧弯相关场景中表现出中等的准确性和沟通质量，ChatGPT显示出适度优势。尽管这两种模型作为患者教育和初步咨询的支持工具都显示出有前景的结果，但它们目前在准确性和全面性方面的局限性限制了其临床可靠性。多学科合作对于确保人工智能在骨科实践中的有效应用至关重要。证据水平：III级，诊断性研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11b6/12362497/ad23972f090f/aott-59-4-222_f001.jpg

相似文献

Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.评估大语言模型在青少年特发性脊柱侧弯护理中的作用：ChatGPT与谷歌Gemini的比较

Acta Orthop Traumatol Turc. 2025 Jul 18;59(4):222-229. doi: 10.5152/j.aott.2025.25279.

Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.葡萄膜炎中大型语言模型性能的基准测试：ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析

Eye (Lond). 2025 Apr;39(6):1132-1137. doi: 10.1038/s41433-024-03545-9. Epub 2024 Dec 17.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能：比较分析

JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.

Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.用于神经外科手术的基于大语言模型的聊天机器人的开发与验证：关于加强围手术期患者教育的混合方法研究

J Med Internet Res. 2025 Jul 15;27:e74299. doi: 10.2196/74299.

Evaluating the Performance of State-of-the-Art Artificial Intelligence Chatbots Based on the WHO Global Guidelines for the Prevention of Surgical Site Infection: Cross-Sectional Study.基于世界卫生组织预防手术部位感染全球指南评估最先进的人工智能聊天机器人的性能：横断面研究

J Med Internet Res. 2025 Jul 31;27:e75567. doi: 10.2196/75567.

How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?不同的人工智能语言模型在向患者介绍静脉曲张的射频消融治疗方面效果如何？

Cureus. 2025 Jun 22;17(6):e86537. doi: 10.7759/cureus.86537. eCollection 2025 Jun.

Comparative performance of ChatGPT, Gemini, and final-year emergency medicine clerkship students in answering multiple-choice questions: implications for the use of AI in medical education.ChatGPT、Gemini与急诊医学实习最后一年学生在回答多项选择题方面的表现比较：人工智能在医学教育中的应用启示

Int J Emerg Med. 2025 Aug 7;18(1):146. doi: 10.1186/s12245-025-00949-6.

Comparing the effectiveness of generative AI technology in commonly asked scoliosis questions.比较生成式人工智能技术在常见脊柱侧弯问题中的有效性。

J Child Orthop. 2025 Jul 26:18632521251359098. doi: 10.1177/18632521251359098.

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果：一项观察性研究的内容分析

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

本文引用的文献

AI = Appropriate Insight? ChatGPT Appropriately Answers Parents' Questions for Common Pediatric Orthopaedic Conditions.人工智能 = 恰当的见解？ChatGPT 恰当地回答了家长关于常见小儿骨科病症的问题。

J Pediatr Soc North Am. 2024 Feb 5;5(4):762. doi: 10.55275/JPOSNA-2023-762. eCollection 2023 Nov.

Is the information provided by large language models valid in educating patients about adolescent idiopathic scoliosis? An evaluation of content, clarity, and empathy : The perspective of the European Spine Study Group.大语言模型提供的信息在对患者进行青少年特发性脊柱侧凸教育方面是否有效？内容、清晰度和同理心的评估：欧洲脊柱研究小组的观点

Spine Deform. 2025 Mar;13(2):361-372. doi: 10.1007/s43390-024-00955-3. Epub 2024 Nov 4.

Editorial Commentary: Studies Evaluating Artificial Intelligence Large Language Models' Ability to Respond to Questions Are Repetitive and Out-of-Date: Artificial Intelligence Must Now Be Applied to Improve Clinical Practice and Patient Care.社论评论：评估人工智能大语言模型回答问题能力的研究重复且过时：现在必须应用人工智能来改善临床实践和患者护理。

Arthroscopy. 2025 Jun;41(6):2009-2011. doi: 10.1016/j.arthro.2024.10.020. Epub 2024 Oct 24.

Artificial Intelligence Large Language Models Address Anterior Cruciate Ligament Reconstruction: Superior Clarity and Completeness by Gemini Compared With ChatGPT-4 in Response to American Academy of Orthopaedic Surgeons Clinical Practice Guidelines.人工智能大语言模型助力前交叉韧带重建：与ChatGPT-4相比，Gemini在回应美国矫形外科医师学会临床实践指南时具有更高的清晰度和完整性。

Arthroscopy. 2025 Jun;41(6):2002-2008. doi: 10.1016/j.arthro.2024.09.020. Epub 2024 Sep 21.

Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions?ChatGPT和Gemini是否能为小儿骨科疾病提供恰当的建议？

J Pediatr Orthop. 2025 Jan 1;45(1):e66-e71. doi: 10.1097/BPO.0000000000002797. Epub 2024 Aug 22.

Personalizing adult spinal deformity surgery through multimodal artificial intelligence.通过多模态人工智能实现成人脊柱畸形手术的个体化。

Acta Orthop Traumatol Turc. 2024 Mar;58(2):80-82. doi: 10.5152/10.5152/j.aott.2024.23215.

Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.ChatGPT、Gemini 与急诊专科医生在急诊病情严重程度分级评估中的比较分析。

Am J Emerg Med. 2024 Jul;81:146-150. doi: 10.1016/j.ajem.2024.05.001. Epub 2024 May 3.

Evaluation of ChatGPT and Gemini large language models for pharmacometrics with NONMEM.评估 ChatGPT 和 Gemini 大型语言模型在 NONMEM 中的药物代谢动力学应用。

J Pharmacokinet Pharmacodyn. 2024 Jun;51(3):187-197. doi: 10.1007/s10928-024-09921-y. Epub 2024 Apr 24.

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison.大语言模型作为青光眼手术病例的辅助工具：ChatGPT 与 Google Gemini 的对比。

Graefes Arch Clin Exp Ophthalmol. 2024 Sep;262(9):2945-2959. doi: 10.1007/s00417-024-06470-5. Epub 2024 Apr 4.

ChatGPT's potential to support home care for patients in the early period after orthopedic interventions and enhance public health.ChatGPT 在支持骨科干预后早期患者的家庭护理和增强公众健康方面的潜力。

Jt Dis Relat Surg. 2024 Jan 1;35(1):169-176. doi: 10.52312/jdrs.2023.1402. Epub 2023 Nov 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估大语言模型在青少年特发性脊柱侧弯护理中的作用：ChatGPT与谷歌Gemini的比较

Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.

作者信息

机构信息

出版信息

目的

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献