三种生成式人工智能聊天机器人用于骨科手术治疗问题的盲法比较

A Blinded Comparison of Three Generative Artificial Intelligence Chatbots for Orthopaedic Surgery Therapeutic Questions.

作者信息

Arora Vikram, Silburt Joseph, Phillips Mark, Khan Moin, Petrisor Brad, Chaudhry Harman, Mundi Raman, Bhandari Mohit

机构信息

Department of Surgery, McMaster University, Hamilton, CAN.

Department of Orthopaedic Surgery, University of Toronto, Toronto, CAN.

出版信息

Cureus. 2024 Jul 25;16(7):e65343. doi: 10.7759/cureus.65343. eCollection 2024 Jul.

DOI:10.7759/cureus.65343

PMID:39184692

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11344479/

Abstract

Objective To compare the quality of responses from three chatbots (ChatGPT, Bing Chat, and AskOE) across various orthopaedic surgery therapeutic treatment questions. Design We identified a series of treatment-related questions across a range of subspecialties in orthopaedic surgery. Questions were "identically" entered into one of three chatbots (ChatGPT, Bing Chat, and AskOE) and reviewed using a standardized rubric. Participants Orthopaedic surgery experts associated with McMaster University and the University of Toronto blindly reviewed all responses. Outcomes The primary outcomes were scores on a five-item assessment tool assessing clinical correctness, clinical completeness, safety, usefulness, and references. The secondary outcome was the reviewers' preferred response for each question. We performed a mixed effects logistic regression to identify factors associated with selecting a preferred chatbot. Results Across all questions and answers, AskOE was preferred by reviewers to a significantly greater extent than both ChatGPT (P<0.001) and Bing (P<0.001). AskOE also received significantly higher total evaluation scores than both ChatGPT (P<0.001) and Bing (P<0.001). Further regression analysis showed that clinical correctness, clinical completeness, usefulness, and references were significantly associated with a preference for AskOE. Across all responses, there were four considered as having major errors in response, with three occurring with ChatGPT and one occurring with AskOE. Conclusions Reviewers significantly preferred AskOE over ChatGPT and Bing Chat across a variety of variables in orthopaedic therapy questions. This technology has important implications in a healthcare setting as it provides access to trustworthy answers in orthopaedic surgery.

摘要

目的比较三款聊天机器人（ChatGPT、必应聊天和AskOE）针对各种骨科手术治疗问题给出的回答质量。设计我们确定了一系列骨科手术各亚专业中与治疗相关的问题。这些问题被“原样”输入三款聊天机器人（ChatGPT、必应聊天和AskOE）之一，并使用标准化评分标准进行审查。参与者与麦克马斯特大学和多伦多大学相关的骨科手术专家对所有回答进行盲审。结果主要结果是在一项五项评估工具上的得分，该工具评估临床正确性、临床完整性、安全性、实用性和参考文献。次要结果是评审人员对每个问题的首选回答。我们进行了混合效应逻辑回归，以确定与选择首选聊天机器人相关的因素。结果在所有问题和回答中，评审人员对AskOE的偏好程度明显高于ChatGPT（P<0.001）和必应（P<0.001）。AskOE的总评估得分也明显高于ChatGPT（P<0.001）和必应（P<0.001）。进一步的回归分析表明，临床正确性、临床完整性、实用性和参考文献与对AskOE的偏好显著相关。在所有回答中，有四个被认为存在重大回答错误，其中三个出现在ChatGPT上，一个出现在AskOE上。结论在骨科治疗问题的各种变量方面，评审人员明显更喜欢AskOE而非ChatGPT和必应聊天。这项技术在医疗环境中具有重要意义，因为它能提供骨科手术中可靠的答案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/979f/11344479/02b7b93064c4/cureus-0016-00000065343-i01.jpg

相似文献

A Blinded Comparison of Three Generative Artificial Intelligence Chatbots for Orthopaedic Surgery Therapeutic Questions.三种生成式人工智能聊天机器人用于骨科手术治疗问题的盲法比较

Cureus. 2024 Jul 25;16(7):e65343. doi: 10.7759/cureus.65343. eCollection 2024 Jul.

Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard.三款聊天机器人的听力学知识比较：ChatGPT、必应聊天和巴德

Audiol Neurootol. 2024;29(6):457-463. doi: 10.1159/000538983. Epub 2024 May 6.

Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。

Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.

Physician Assessment of ChatGPT and Bing Answers to American Cancer Society's Questions to Ask About Your Cancer.医生对 ChatGPT 和 Bing 回答美国癌症协会关于癌症问题的评估。

Am J Clin Oncol. 2024 Jan 1;47(1):17-21. doi: 10.1097/COC.0000000000001050. Epub 2023 Oct 12.

Efficacy of AI Chats to Determine an Emergency: A Comparison Between OpenAI's ChatGPT, Google Bard, and Microsoft Bing AI Chat.人工智能聊天工具在判定紧急情况方面的效能：OpenAI的ChatGPT、谷歌巴德和微软必应人工智能聊天工具的比较

Cureus. 2023 Sep 18;15(9):e45473. doi: 10.7759/cureus.45473. eCollection 2023 Sep.

Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia.人工智能大语言模型聊天机器人在回答麻醉常见问题方面的比较。

BJA Open. 2024 May 8;10:100280. doi: 10.1016/j.bjao.2024.100280. eCollection 2024 Jun.

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力：ChatGPT、谷歌巴德和微软必应的比较研究

Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能：比较混合方法研究。

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.利用人工智能在减重手术中的应用：ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。

Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.

Performance of Generative Large Language Models on Ophthalmology Board-Style Questions.生成式大型语言模型在眼科 Board 式问题中的表现。

Am J Ophthalmol. 2023 Oct;254:141-149. doi: 10.1016/j.ajo.2023.05.024. Epub 2023 Jun 18.

引用本文的文献

Performance evaluation of large language models for the national nursing examination in Japan.日本国家护士考试中大型语言模型的性能评估

Digit Health. 2025 May 27;11:20552076251346571. doi: 10.1177/20552076251346571. eCollection 2025 Jan-Dec.

本文引用的文献

ChatGPT and large language models in orthopedics: from education and surgery to research.骨科领域的ChatGPT和大语言模型：从教育、手术到研究

J Exp Orthop. 2023 Dec 1;10(1):128. doi: 10.1186/s40634-023-00700-1.

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study.ChatGPT 在常见骨科疾病自我诊断中的潜力：探索性研究。

J Med Internet Res. 2023 Sep 15;25:e47621. doi: 10.2196/47621.

Evaluation of inpatient medication guidance from an artificial intelligence chatbot.评价人工智能聊天机器人提供的住院患者用药指导。

Am J Health Syst Pharm. 2023 Dec 5;80(24):1822-1829. doi: 10.1093/ajhp/zxad193.

High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content.ChatGPT生成的医学内容中虚假和不准确参考文献的高比例。

Cureus. 2023 May 19;15(5):e39238. doi: 10.7759/cureus.39238. eCollection 2023 May.

Use of an Artificial Intelligence Conversational Agent (Chatbot) for Hip Arthroscopy Patients Following Surgery.人工智能对话代理（聊天机器人）在髋关节镜手术患者术后的应用。

Arthrosc Sports Med Rehabil. 2023 Mar 16;5(2):e495-e505. doi: 10.1016/j.asmr.2023.01.020. eCollection 2023 Apr.

Machine Learning in Orthopedics: A Literature Review.骨科中的机器学习：文献综述

Front Bioeng Biotechnol. 2018 Jun 27;6:75. doi: 10.3389/fbioe.2018.00075. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

三种生成式人工智能聊天机器人用于骨科手术治疗问题的盲法比较

A Blinded Comparison of Three Generative Artificial Intelligence Chatbots for Orthopaedic Surgery Therapeutic Questions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献