人工智能聊天机器人提供的关于头颈癌重建手术的信息质量：ChatGPT4与Claude2的比较分析

Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Reconstructive Surgery for Head and Neck Cancer: A Comparative Analysis Between ChatGPT4 and Claude2.

作者信息

Boscolo-Rizzo Paolo, Marcuzzo Alberto Vito, Lazzarin Chiara, Giudici Fabiola, Polesel Jerry, Stellin Marco, Pettorelli Andrea, Spinato Giacomo, Ottaviano Giancarlo, Ferrari Marco, Borsetto Daniele, Zucchini Simone, Trabalzini Franco, Sia Egidio, Gardenal Nicoletta, Baruca Roberto, Fortunati Alfonso, Vaira Luigi Angelo, Tirelli Giancarlo

机构信息

Department of Medical, Surgical and Health Sciences, Section of Otolaryngology, University of Trieste, Trieste, Italy.

Unit of Cancer Epidemiology, Centro di Riferimento Oncologico di Aviano (CRO) Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Aviano, Italy.

出版信息

Clin Otolaryngol. 2025 Mar;50(2):330-335. doi: 10.1111/coa.14261. Epub 2024 Dec 4.

DOI:10.1111/coa.14261

PMID:39628451

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11792429/

Abstract

INTRODUCTION

Artificial Intelligences (AIs) are changing the way information is accessed and consumed globally. This study aims to evaluate the information quality provided by AIs ChatGPT4 and Claude2 concerning reconstructive surgery for head and neck cancer.

METHODS

Thirty questions on reconstructive surgery for head and neck cancer were directed to both AIs and 16 head and neck surgeons assessed the responses using the QAMAI questionnaire. A 5-point Likert scale was used to assess accuracy, clarity, relevance, completeness, sources, and usefulness. Questions were categorised into those suitable for patients (group 1) and those for surgeons (group 2). AI responses were compared using t-Student and McNemar tests. Surgeon score agreement was measured with intraclass correlation coefficient, and readability was assessed with Flesch-Kincaid Grade Level (FKGL).

RESULTS

ChatGPT4 and Claude2 had similar overall mean scores of accuracy, clarity, relevance, completeness and usefulness, while Claude2 outperformed ChatGPT4 in sources (110.0 vs. 92.1, p < 0.001). Considering the group 2, Claude2 showed significantly lower accuracy and completeness scores compared to ChatGPT4 (p = 0.003 and p = 0.002, respectively). Regarding readability, ChatGPT4 presented lower complexity than Claude2 (FKGL mean score 4.57 vs. 6.05, p < 0.001) requiring an easy-fairly easy English in 93% of cases.

CONCLUSION

Our findings indicate that neither chatbot exhibits a decisive superiority in all aspects. Nonetheless, ChatGPT4 demonstrates greater accuracy and comprehensiveness for specific types of questions and the simpler language used may aid patient inquiries. However, many evaluators disagree with chatbot information, highlighting that AI systems cannot serve as a substitute for advice from medical professionals.

摘要

引言

人工智能正在改变全球获取和使用信息的方式。本研究旨在评估人工智能ChatGPT4和Claude2提供的关于头颈癌重建手术的信息质量。

方法

向这两个人工智能提出了30个关于头颈癌重建手术的问题，16名头颈外科医生使用QAMAI问卷评估了回答。使用5点李克特量表评估准确性、清晰度、相关性、完整性、来源和有用性。问题分为适合患者的问题（第1组）和适合外科医生的问题（第2组）。使用t检验和McNemar检验比较人工智能的回答。使用组内相关系数测量外科医生评分的一致性，并使用弗莱施-金凯德年级水平（FKGL）评估可读性。

结果

ChatGPT4和Claude2在准确性、清晰度、相关性、完整性和有用性方面的总体平均得分相似，而Claude2在来源方面表现优于ChatGPT4（110.0对92.1，p<0.001）。考虑第2组，与ChatGPT4相比，Claude2的准确性和完整性得分显著更低（分别为p=0.003和p=0.002）。关于可读性，ChatGPT4的复杂性低于Claude2（FKGL平均得分4.57对6.05，p<0.001），在93%的情况下需要简单-相当简单的英语。

结论

我们的研究结果表明，这两个聊天机器人在所有方面都没有表现出决定性的优势。尽管如此，ChatGPT4在特定类型的问题上表现出更高的准确性和全面性，并且使用的语言更简单可能有助于患者咨询。然而，许多评估者不同意聊天机器人提供的信息，强调人工智能系统不能替代医学专业人员的建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30af/11792429/a0dd78718c72/COA-50-330-g001.jpg

相似文献

Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Reconstructive Surgery for Head and Neck Cancer: A Comparative Analysis Between ChatGPT4 and Claude2.人工智能聊天机器人提供的关于头颈癌重建手术的信息质量：ChatGPT4与Claude2的比较分析

Clin Otolaryngol. 2025 Mar;50(2):330-335. doi: 10.1111/coa.14261. Epub 2024 Dec 4.

Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study.前瞻性评估 4 种大型语言模型聊天机器人对患者关于急救护理问题的回答的准确性：实验性对比研究。

J Med Internet Res. 2024 Nov 4;26:e60291. doi: 10.2196/60291.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Management of Vestibular Schwannomas: A Comparative Analysis Between ChatGPT-4 and Claude 2.人工智能聊天机器人提供的关于前庭神经鞘瘤管理的信息质量：ChatGPT-4与Claude 2的比较分析

Otol Neurotol. 2025 Apr 1;46(4):432-436. doi: 10.1097/MAO.0000000000004410. Epub 2025 Feb 4.

ChatGPT4's proficiency in addressing patients' questions on systemic lupus erythematosus: a blinded comparative study with specialists.ChatGPT4 在回答系统性红斑狼疮患者问题方面的能力：与专家进行的盲法比较研究。

Rheumatology (Oxford). 2024 Sep 1;63(9):2450-2456. doi: 10.1093/rheumatology/keae238.

Both Patients and Plastic Surgeons Prefer Artificial Intelligence-Generated Microsurgical Information.患者和整形外科医生都更喜欢人工智能生成的显微手术信息。

J Reconstr Microsurg. 2024 Nov;40(9):657-664. doi: 10.1055/a-2273-4163. Epub 2024 Feb 21.

Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia.人工智能大语言模型聊天机器人在回答麻醉常见问题方面的比较。

BJA Open. 2024 May 8;10:100280. doi: 10.1016/j.bjao.2024.100280. eCollection 2024 Jun.

Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms.验证医学人工智能质量分析（QAMAI）工具：一种评估人工智能平台提供的健康信息质量的新工具。

Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6123-6131. doi: 10.1007/s00405-024-08710-0. Epub 2024 May 4.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性：一项观察性横断面研究。

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

The Use of Artificial Intelligence to Improve Readability of Otolaryngology Patient Education Materials.利用人工智能提高耳鼻喉科患者教育材料的可读性。

Otolaryngol Head Neck Surg. 2024 Aug;171(2):603-608. doi: 10.1002/ohn.816. Epub 2024 May 15.

本文引用的文献

Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6123-6131. doi: 10.1007/s00405-024-08710-0. Epub 2024 May 4.

Accuracy of an Artificial Intelligence Chatbot's Interpretation of Clinical Ophthalmic Images.人工智能聊天机器人对临床眼科图像的解读准确性。

JAMA Ophthalmol. 2024 Apr 1;142(4):321-326. doi: 10.1001/jamaophthalmol.2024.0017.

A Comparative Analysis of AI Models in Complex Medical Decision-Making Scenarios: Evaluating ChatGPT, Claude AI, Bard, and Perplexity.复杂医疗决策场景中人工智能模型的比较分析：评估ChatGPT、Claude AI、Bard和Perplexity

Cureus. 2024 Jan 18;16(1):e52485. doi: 10.7759/cureus.52485. eCollection 2024 Jan.

Cost of Illness of Head and Neck Cancer in Sweden.瑞典头颈部癌症的疾病经济负担。

Value Health. 2024 Apr;27(4):425-432. doi: 10.1016/j.jval.2024.01.007. Epub 2024 Feb 1.

Cancer misinformation puts patients in harm's way.癌症相关的错误信息会让患者陷入危险境地。

Lancet Oncol. 2024 Feb;25(2):165-166. doi: 10.1016/S1470-2045(24)00011-1. Epub 2024 Jan 11.

ChatGPT vs UpToDate: comparative study of usefulness and reliability of Chatbot in common clinical presentations of otorhinolaryngology-head and neck surgery.ChatGPT 与 UpToDate：在耳鼻喉头颈外科常见临床情况下比较评估 Chatbot 的有用性和可靠性的研究。

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2145-2151. doi: 10.1007/s00405-023-08423-w. Epub 2024 Jan 13.

ChatGPT performance in laryngology and head and neck surgery: a clinical case-series.ChatGPT 在喉科学和头颈外科学中的应用：一项临床病例系列研究。

Eur Arch Otorhinolaryngol. 2024 Jan;281(1):319-333. doi: 10.1007/s00405-023-08282-5. Epub 2023 Oct 24.

Large Language Model-Based Chatbot vs Surgeon-Generated Informed Consent Documentation for Common Procedures.基于大语言模型的聊天机器人与外科医生生成的常见手术知情同意书文档。

JAMA Netw Open. 2023 Oct 2;6(10):e2336997. doi: 10.1001/jamanetworkopen.2023.36997.

Team-Based Surgical Approach to Head and Neck Microvascular Free Flap Reconstruction.基于团队的头颈部微血管游离皮瓣重建手术方法。

JAMA Otolaryngol Head Neck Surg. 2023 Nov 1;149(11):1021-1026. doi: 10.1001/jamaoto.2023.3028.

Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。

JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能聊天机器人提供的关于头颈癌重建手术的信息质量：ChatGPT4与Claude2的比较分析

Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Reconstructive Surgery for Head and Neck Cancer: A Comparative Analysis Between ChatGPT4 and Claude2.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

引言

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献