大型语言模型在高级头颈部恶性肿瘤管理中的可靠性：ChatGPT 4 与 Gemini Advanced 之间的比较。

Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced.

机构信息

Division of Otolaryngology, Department of Surgical Sciences, Università degli Studi di Torino, Turin, Italy.

Otolaryngology Unit, Santi Paolo e Carlo Hospital, Department of Health Sciences, Università degli Studi di Milano, Milan, Italy.

出版信息

Eur Arch Otorhinolaryngol. 2024 Sep;281(9):5001-5006. doi: 10.1007/s00405-024-08746-2. Epub 2024 May 25.

DOI:10.1007/s00405-024-08746-2

PMID:38795148

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11392976/

Abstract

PURPOSE

This study evaluates the efficacy of two advanced Large Language Models (LLMs), OpenAI's ChatGPT 4 and Google's Gemini Advanced, in providing treatment recommendations for head and neck oncology cases. The aim is to assess their utility in supporting multidisciplinary oncological evaluations and decision-making processes.

METHODS

This comparative analysis examined the responses of ChatGPT 4 and Gemini Advanced to five hypothetical cases of head and neck cancer, each representing a different anatomical subsite. The responses were evaluated against the latest National Comprehensive Cancer Network (NCCN) guidelines by two blinded panels using the total disagreement score (TDS) and the artificial intelligence performance instrument (AIPI). Statistical assessments were performed using the Wilcoxon signed-rank test and the Friedman test.

RESULTS

Both LLMs produced relevant treatment recommendations with ChatGPT 4 generally outperforming Gemini Advanced regarding adherence to guidelines and comprehensive treatment planning. ChatGPT 4 showed higher AIPI scores (median 3 [2-4]) compared to Gemini Advanced (median 2 [2-3]), indicating better overall performance. Notably, inconsistencies were observed in the management of induction chemotherapy and surgical decisions, such as neck dissection.

CONCLUSIONS

While both LLMs demonstrated the potential to aid in the multidisciplinary management of head and neck oncology, discrepancies in certain critical areas highlight the need for further refinement. The study supports the growing role of AI in enhancing clinical decision-making but also emphasizes the necessity for continuous updates and validation against current clinical standards to integrate AI into healthcare practices fully.

摘要

目的

本研究评估了两种先进的大型语言模型（LLM），OpenAI 的 ChatGPT 4 和 Google 的 Gemini Advanced，在为头颈部肿瘤病例提供治疗建议方面的效果。旨在评估它们在支持多学科肿瘤评估和决策过程中的效用。

方法

这项对比分析评估了 ChatGPT 4 和 Gemini Advanced 对五个头颈部癌症假设病例的反应，每个病例代表不同的解剖亚部位。通过两个盲法小组使用总分歧评分（TDS）和人工智能绩效工具（AIPI），将这些反应与最新的国家综合癌症网络（NCCN）指南进行评估。使用 Wilcoxon 符号秩检验和 Friedman 检验进行统计评估。

结果

两种 LLM 都提出了相关的治疗建议，ChatGPT 4 在遵守指南和全面治疗计划方面普遍优于 Gemini Advanced。ChatGPT 4 的 AIPI 评分（中位数 3 [2-4]）高于 Gemini Advanced（中位数 2 [2-3]），表明总体性能更好。值得注意的是，在诱导化疗和手术决策（如颈部清扫术）的管理方面观察到了不一致性。

结论

虽然两种 LLM 都显示出在头颈部肿瘤多学科管理中辅助的潜力，但在某些关键领域的差异突出表明需要进一步改进。该研究支持人工智能在增强临床决策方面的作用不断增长，但也强调了需要不断更新并针对当前临床标准进行验证，以充分将人工智能整合到医疗保健实践中。

相似文献

Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced.大型语言模型在高级头颈部恶性肿瘤管理中的可靠性：ChatGPT 4 与 Gemini Advanced 之间的比较。

Eur Arch Otorhinolaryngol. 2024 Sep;281(9):5001-5006. doi: 10.1007/s00405-024-08746-2. Epub 2024 May 25.

Exploring the landscape of AI-assisted decision-making in head and neck cancer treatment: a comparative analysis of NCCN guidelines and ChatGPT responses.探索人工智能辅助头颈部癌症治疗决策的全景：NCCN 指南与 ChatGPT 回复的比较分析。

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2123-2136. doi: 10.1007/s00405-024-08525-z. Epub 2024 Feb 29.

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries.人工智能在手外科中的应用：评估大语言模型在手部损伤分类与管理中的作用

J Clin Med. 2024 May 11;13(10):2832. doi: 10.3390/jcm13102832.

Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics.ChatGPT 3.5、ChatGPT 4.0和谷歌Gemini等人工智能模型在神经放射学诊断中的比较评估

Cureus. 2024 Aug 25;16(8):e67766. doi: 10.7759/cureus.67766. eCollection 2024 Aug.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比：与眼科住院医师一起对医学知识进行的全面考察

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

The Role of Large Language Models (LLMs) in Providing Triage for Maxillofacial Trauma Cases: A Preliminary Study.大语言模型在颌面创伤病例分诊中的作用：一项初步研究。

Diagnostics (Basel). 2024 Apr 18;14(8):839. doi: 10.3390/diagnostics14080839.

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力：ChatGPT、谷歌巴德和微软必应的比较研究

Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.

Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data.大型语言模型在急诊整形手术决策中的比较分析：体格检查数据的作用

J Pers Med. 2024 Jun 8;14(6):612. doi: 10.3390/jpm14060612.

Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini.大型语言模型在整形手术中的术中决策支持：ChatGPT-4 和 Gemini 的比较。

Medicina (Kaunas). 2024 Jun 8;60(6):957. doi: 10.3390/medicina60060957.

Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients.人工智能能回答我的问题吗？腹部整形手术患者围手术期评估中人工智能的应用。

Aesthetic Plast Surg. 2024 Nov;48(22):4712-4724. doi: 10.1007/s00266-024-04157-0. Epub 2024 Jun 19.

引用本文的文献

Evaluation of Multiple-Choice Tests in Head and Neck Ultrasound Created by Physicians and Large Language Models.医生和大语言模型创建的头颈部超声选择题测试评估

Diagnostics (Basel). 2025 Jul 22;15(15):1848. doi: 10.3390/diagnostics15151848.

Assessing LLMs on IDSA Practice Guidelines for the Diagnosis and Treatment of Native Vertebral Osteomyelitis: A Comparison Study.根据美国感染病学会（IDSA）关于原发性椎体骨髓炎诊断和治疗的实践指南评估大语言模型：一项比较研究。

J Clin Med. 2025 Jul 15;14(14):4996. doi: 10.3390/jcm14144996.

Clinical decision support using large language models in otolaryngology: a systematic review.耳鼻喉科中使用大语言模型的临床决策支持：一项系统综述。

Eur Arch Otorhinolaryngol. 2025 Jun 6. doi: 10.1007/s00405-025-09504-8.

The role of ChatGPT-4o in differential diagnosis and management of vertigo-related disorders.ChatGPT-4o在眩晕相关疾病的鉴别诊断与管理中的作用。

Sci Rep. 2025 May 28;15(1):18688. doi: 10.1038/s41598-025-96309-8.

The Role of Artificial Intelligence (ChatGPT-4o) in Supporting Tumor Board Decisions.人工智能（ChatGPT-4o）在辅助肿瘤专家委员会决策中的作用

J Clin Med. 2025 May 18;14(10):3535. doi: 10.3390/jcm14103535.

Applications of Natural Language Processing in Otolaryngology: A Scoping Review.自然语言处理在耳鼻咽喉科的应用：一项范围综述

Laryngoscope. 2025 Sep;135(9):3049-3063. doi: 10.1002/lary.32198. Epub 2025 May 1.

A proof-of-concept study for patient use of open notes with large language models.一项关于患者使用带有大语言模型的开放病历的概念验证研究。

JAMIA Open. 2025 Apr 9;8(2):ooaf021. doi: 10.1093/jamiaopen/ooaf021. eCollection 2025 Apr.

Evaluating the Efficacy of Artificial Intelligence-Driven Chatbots in Addressing Queries on Vernal Conjunctivitis.评估人工智能驱动的聊天机器人在解答春季结膜炎相关问题方面的效果。

Cureus. 2025 Feb 26;17(2):e79688. doi: 10.7759/cureus.79688. eCollection 2025 Feb.

Chat Generative Pre-Trained Transformer (ChatGPT) in Oral and Maxillofacial Surgery: A Narrative Review on Its Research Applications and Limitations.口腔颌面外科中的聊天生成预训练变换器（ChatGPT）：关于其研究应用和局限性的叙述性综述

J Clin Med. 2025 Feb 18;14(4):1363. doi: 10.3390/jcm14041363.

Evaluating Artificial Intelligence in Spinal Cord Injury Management: A Comparative Analysis of ChatGPT-4o and Google Gemini Against American College of Surgeons Best Practices Guidelines for Spine Injury.评估人工智能在脊髓损伤管理中的应用：ChatGPT-4o和谷歌Gemini与美国外科医生学会脊柱损伤最佳实践指南的对比分析

Global Spine J. 2025 Feb 17:21925682251321837. doi: 10.1177/21925682251321837.

本文引用的文献

Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series.ChatGPT-4与耳鼻喉科医生的表现及一致性：临床病例系列

Otolaryngol Head Neck Surg. 2024 Jun;170(6):1519-1526. doi: 10.1002/ohn.759. Epub 2024 Apr 9.

ChatGPT in Head and Neck Oncology-Opportunities and Challenges.头颈肿瘤学中的ChatGPT——机遇与挑战

Indian J Otolaryngol Head Neck Surg. 2024 Feb;76(1):1425-1429. doi: 10.1007/s12070-023-04201-6. Epub 2023 Aug 31.

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2123-2136. doi: 10.1007/s00405-024-08525-z. Epub 2024 Feb 29.

Generative artificial intelligence in otolaryngology-head and neck surgery editorial: be an actor of the future or follower.耳鼻喉头颈外科学中的生成式人工智能社论：成为未来的参与者还是追随者。

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2051-2053. doi: 10.1007/s00405-024-08579-z.

Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation.大型语言模型在管理牙源性鼻窦炎临床场景中的可靠性：初步多学科评估。

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):1835-1841. doi: 10.1007/s00405-023-08372-4. Epub 2024 Jan 8.

Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI).评估智能聊天机器人性能的工具的有效性和可靠性：人工智能性能评估工具（AIPI）。

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2063-2079. doi: 10.1007/s00405-023-08219-y. Epub 2023 Sep 12.

Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis.ChatGPT生成的关于头颈及口腔颌面外科信息的准确性：一项多中心协作分析

Otolaryngol Head Neck Surg. 2024 Jun;170(6):1492-1503. doi: 10.1002/ohn.489. Epub 2023 Aug 18.

Revolutionary Potential of ChatGPT in Constructing Intelligent Clinical Decision Support Systems.ChatGPT 在构建智能临床决策支持系统方面的革命性潜力。

Ann Biomed Eng. 2024 Feb;52(2):125-129. doi: 10.1007/s10439-023-03288-w. Epub 2023 Jun 18.

Artificial Intelligence in Head and Neck Cancer: A Systematic Review of Systematic Reviews.人工智能在头颈部肿瘤中的应用：系统评价的系统评价。

Adv Ther. 2023 Aug;40(8):3360-3380. doi: 10.1007/s12325-023-02527-9. Epub 2023 Jun 8.

Using AI-generated suggestions from ChatGPT to optimize clinical decision support.利用 ChatGPT 生成的人工智能建议来优化临床决策支持。

J Am Med Inform Assoc. 2023 Jun 20;30(7):1237-1245. doi: 10.1093/jamia/ocad072.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验