Dzieza Wojciech K, Hampton Hailey, Farmer Kevin W, Roach Ryan P, Kwon John Y, Yildirim Ahmet Toygun, Horodyski MaryBeth, Toussaint Rull James
Department of Orthopaedic Surgery and Sports Medicine, College of Medicine, University of Florida, Gainesville, FL, United States.
Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States.
Front Digit Health. 2025 Jul 3;7:1614344. doi: 10.3389/fdgth.2025.1614344. eCollection 2025.
Artificial intelligence (AI) chatbots have gained popularity as a source of information that is easily accessed by patients. The best treatment of acute Achilles tendon ruptures (AATR) remains controversial due to varying surgical repair techniques, postoperative protocols, nonoperative treatment options, and surgeon and patient factors. Given that patients will continue to turn towards AI for answers to medical questions, the purpose of this study is to evaluate whether popular AI engines can provide adequate responses to frequently asked questions regarding AATR.
Three AI engines (ChatGPT, Google Gemini, and Microsoft Copilot) were prompted for a concise response to ten common questions regarding AATR management. Four board-certified orthopaedic surgeons were asked to assess the responses using a four-point scale. A Kruskal-Wallis test was used to compare the responses between the three AI systems using the scores assigned by the surgeons.
All three engines provided comparable answers to 7 of 10 questions (70%). Significant differences were noted between the AI systems for three of the ten questions (Question 4, overall = .027; Question 7, overall = .043; and Question 10, overall = .033). analyses revealed that Copilot received significantly poorer scores (higher mean ratings) compared to Gemini for Question 4 (adjusted = .028) and Question 7 (adjusted = .036), and poorer score compared to ChatGPT for Question 10 (adjusted = .033).
AI chatbots can appropriately answer concise prompts about diagnosis and management of AATR. The responses provided by the three AI chatbots analyzed in our study were largely uniform and satisfactory, with only one of the engines scoring lower on three of the ten questions. As AI engines advance, they will become an important tool for patient education in orthopaedics.
人工智能(AI)聊天机器人已成为患者易于获取信息的来源并广受欢迎。由于手术修复技术、术后方案、非手术治疗选择以及外科医生和患者因素各不相同,急性跟腱断裂(AATR)的最佳治疗方法仍存在争议。鉴于患者会继续向人工智能寻求医学问题的答案,本研究的目的是评估流行的人工智能引擎是否能对有关AATR的常见问题提供充分的回答。
针对关于AATR管理的十个常见问题,向三个人工智能引擎(ChatGPT、谷歌Gemini和微软Copilot)提出简洁回答的要求。邀请四位获得骨科专科认证的外科医生使用四点量表对回答进行评估。使用外科医生给出的分数,通过Kruskal-Wallis检验比较三个人工智能系统的回答。
三个引擎对10个问题中的7个(70%)给出了类似的答案。在10个问题中的3个问题上,人工智能系统之间存在显著差异(问题4,总体P = 0.027;问题7,总体P = 0.043;问题10,总体P = 0.033)。分析显示,在问题4(调整后P = 0.028)和问题7(调整后P = 0.036)上,Copilot的得分明显低于Gemini,在问题10上(调整后P = 0.033),Copilot的得分低于ChatGPT。
人工智能聊天机器人可以适当地回答有关AATR诊断和管理的简洁提问。我们研究中分析的三个人工智能聊天机器人提供的回答在很大程度上是一致且令人满意的,只有一个引擎在10个问题中的3个问题上得分较低。随着人工智能引擎的发展,它们将成为骨科患者教育的重要工具。