Arora Vikram, Silburt Joseph, Phillips Mark, Khan Moin, Petrisor Brad, Chaudhry Harman, Mundi Raman, Bhandari Mohit
Department of Surgery, McMaster University, Hamilton, CAN.
Department of Orthopaedic Surgery, University of Toronto, Toronto, CAN.
Cureus. 2024 Jul 25;16(7):e65343. doi: 10.7759/cureus.65343. eCollection 2024 Jul.
Objective To compare the quality of responses from three chatbots (ChatGPT, Bing Chat, and AskOE) across various orthopaedic surgery therapeutic treatment questions. Design We identified a series of treatment-related questions across a range of subspecialties in orthopaedic surgery. Questions were "identically" entered into one of three chatbots (ChatGPT, Bing Chat, and AskOE) and reviewed using a standardized rubric. Participants Orthopaedic surgery experts associated with McMaster University and the University of Toronto blindly reviewed all responses. Outcomes The primary outcomes were scores on a five-item assessment tool assessing clinical correctness, clinical completeness, safety, usefulness, and references. The secondary outcome was the reviewers' preferred response for each question. We performed a mixed effects logistic regression to identify factors associated with selecting a preferred chatbot. Results Across all questions and answers, AskOE was preferred by reviewers to a significantly greater extent than both ChatGPT (P<0.001) and Bing (P<0.001). AskOE also received significantly higher total evaluation scores than both ChatGPT (P<0.001) and Bing (P<0.001). Further regression analysis showed that clinical correctness, clinical completeness, usefulness, and references were significantly associated with a preference for AskOE. Across all responses, there were four considered as having major errors in response, with three occurring with ChatGPT and one occurring with AskOE. Conclusions Reviewers significantly preferred AskOE over ChatGPT and Bing Chat across a variety of variables in orthopaedic therapy questions. This technology has important implications in a healthcare setting as it provides access to trustworthy answers in orthopaedic surgery.
目的 比较三款聊天机器人(ChatGPT、必应聊天和AskOE)针对各种骨科手术治疗问题给出的回答质量。设计 我们确定了一系列骨科手术各亚专业中与治疗相关的问题。这些问题被“原样”输入三款聊天机器人(ChatGPT、必应聊天和AskOE)之一,并使用标准化评分标准进行审查。参与者 与麦克马斯特大学和多伦多大学相关的骨科手术专家对所有回答进行盲审。结果 主要结果是在一项五项评估工具上的得分,该工具评估临床正确性、临床完整性、安全性、实用性和参考文献。次要结果是评审人员对每个问题的首选回答。我们进行了混合效应逻辑回归,以确定与选择首选聊天机器人相关的因素。结果 在所有问题和回答中,评审人员对AskOE的偏好程度明显高于ChatGPT(P<0.001)和必应(P<0.001)。AskOE的总评估得分也明显高于ChatGPT(P<0.001)和必应(P<0.001)。进一步的回归分析表明,临床正确性、临床完整性、实用性和参考文献与对AskOE的偏好显著相关。在所有回答中,有四个被认为存在重大回答错误,其中三个出现在ChatGPT上,一个出现在AskOE上。结论 在骨科治疗问题的各种变量方面,评审人员明显更喜欢AskOE而非ChatGPT和必应聊天。这项技术在医疗环境中具有重要意义,因为它能提供骨科手术中可靠的答案。