• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在耳鼻喉科、头颈外科中,评估本地运行和基于网络的大语言模型与人类委员会建议的决策情况。

Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery.

作者信息

Buhr Christoph Raphael, Ernst Benjamin Philipp, Blaikie Andrew, Smith Harry, Kelsey Tom, Matthias Christoph, Fleischmann Maximilian, Jungmann Florian, Alt Jürgen, Brandts Christian, Kämmerer Peer W, Foersch Sebastian, Kuhn Sebastian, Eckrich Jonas

机构信息

Department of Otorhinolaryngology, University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, 55131, Mainz, Germany.

School of Medicine, University of St Andrews, St Andrews, UK.

出版信息

Eur Arch Otorhinolaryngol. 2025 Mar;282(3):1593-1607. doi: 10.1007/s00405-024-09153-3. Epub 2025 Jan 10.

DOI:10.1007/s00405-024-09153-3
PMID:39792200
原文链接:
https://pmc.ncbi.nlm.nih.gov/articles/PMC11890241/
Abstract

INTRODUCTION

Tumor boards are a cornerstone of modern cancer treatment. Given their advanced capabilities, the role of Large Language Models (LLMs) in generating tumor board decisions for otorhinolaryngology (ORL) head and neck surgery is gaining increasing attention. However, concerns over data protection and the use of confidential patient information in web-based LLMs have restricted their widespread adoption and hindered the exploration of their full potential. In this first study of its kind we compared standard human multidisciplinary tumor board recommendations (MDT) against a web-based LLM (ChatGPT-4o) and a locally run LLM (Llama 3) addressing data protection concerns.

MATERIAL AND METHODS

Twenty-five simulated tumor board cases were presented to an MDT composed of specialists from otorhinolaryngology, craniomaxillofacial surgery, medical oncology, radiology, radiation oncology, and pathology. This multidisciplinary team provided a comprehensive analysis of the cases. The same cases were input into ChatGPT-4o and Llama 3 using structured prompts, and the concordance between the LLMs' and MDT's recommendations was assessed. Four MDT members evaluated the LLMs' recommendations in terms of medical adequacy (using a six-point Likert scale) and whether the information provided could have influenced the MDT's original recommendations.

RESULTS

ChatGPT-4o showed 84% concordance (21 out of 25 cases) and Llama 3 demonstrated 92% concordance (23 out of 25 cases) with the MDT in distinguishing between curative and palliative treatment strategies. In 64% of cases (16/25) ChatGPT-4o and in 60% of cases (15/25) Llama, identified all first-line therapy options considered by the MDT, though with varying priority. ChatGPT-4o presented all the MDT's first-line therapies in 52% of cases (13/25), while Llama 3 offered a homologous treatment strategy in 48% of cases (12/25). Additionally, both models proposed at least one of the MDT's first-line therapies as their top recommendation in 28% of cases (7/25). The ratings for medical adequacy yielded a mean score of 4.7 (IQR: 4-6) for ChatGPT-4o and 4.3 (IQR: 3-5) for Llama 3. In 17% of the assessments (33/200), MDT members indicated that the LLM recommendations could potentially enhance the MDT's decisions.

DISCUSSION

This study demonstrates the capability of both LLMs to provide viable therapeutic recommendations in ORL head and neck surgery. Llama 3, operating locally, bypasses many data protection issues and shows promise as a clinical tool to support MDT decisions. However at present, LLMs should augment rather than replace human decision-making.

摘要

引言

肿瘤多学科协作组是现代癌症治疗的基石。鉴于其先进的功能,大语言模型(LLMs)在为耳鼻喉科(ORL)头颈外科生成肿瘤多学科协作组决策方面的作用日益受到关注。然而,对基于网络的大语言模型中数据保护和患者机密信息使用的担忧限制了它们的广泛采用,并阻碍了对其全部潜力的探索。在同类的第一项研究中,我们将标准的人类多学科肿瘤多学科协作组建议(MDT)与一个基于网络的大语言模型(ChatGPT - 4o)和一个本地运行的大语言模型(Llama 3)进行了比较,同时解决了数据保护问题。

材料与方法

向一个由耳鼻喉科、颅颌面外科、医学肿瘤学、放射学、放射肿瘤学和病理学专家组成的MDT展示了25个模拟肿瘤多学科协作组病例。这个多学科团队对病例进行了全面分析。使用结构化提示将相同的病例输入到ChatGPT - 4o和Llama 3中,并评估大语言模型与MDT建议之间的一致性。四名MDT成员从医学充分性(使用六点李克特量表)以及所提供的信息是否可能影响MDT的原始建议方面评估了大语言模型的建议。

结果

在区分根治性和姑息性治疗策略方面,ChatGPT - 4o与MDT的一致性为84%(25个病例中的21个),Llama 3与MDT的一致性为92%(25个病例中的23个)。在64%的病例(16/25)中ChatGPT - 4o以及在60%的病例(15/25)中Llama识别出了MDT考虑的所有一线治疗方案,尽管优先级不同。ChatGPT - 4o在52%的病例(13/25)中呈现了MDT的所有一线治疗方案,而Llama 3在48%的病例(12/25)中提供了同源治疗策略。此外,两个模型在28%的病例(7/25)中都将MDT的至少一种一线治疗方案作为其首要推荐。医学充分性评分方面,ChatGPT - 4o的平均得分为4.7(四分位距:4 - 6),Llama 3的平均得分为4.3(四分位距:3 - 5)。在17%的评估(33/200)中,MDT成员表示大语言模型的建议可能会增强MDT的决策。

讨论

本研究证明了两个大语言模型在耳鼻喉科头颈外科提供可行治疗建议的能力。本地运行的Llama 3绕过了许多数据保护问题,并显示出作为支持MDT决策的临床工具的潜力。然而目前,大语言模型应该增强而不是取代人类决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c874/11890241/fb2930aaefe1/405_2024_9153_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c874/11890241/c54a33d1e61a/405_2024_9153_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c874/11890241/43ad0073dc29/405_2024_9153_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c874/11890241/fb2930aaefe1/405_2024_9153_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c874/11890241/c54a33d1e61a/405_2024_9153_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c874/11890241/43ad0073dc29/405_2024_9153_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c874/11890241/fb2930aaefe1/405_2024_9153_Fig3_HTML.jpg

相似文献

1
Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery.在耳鼻喉科、头颈外科中,评估本地运行和基于网络的大语言模型与人类委员会建议的决策情况。
Eur Arch Otorhinolaryngol. 2025 Mar;282(3):1593-1607. doi: 10.1007/s00405-024-09153-3. Epub 2025 Jan 10.
2
Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for recurrent/metastatic head and neck cancer cases - the first study on ChatGPT 4o and a comparison to ChatGPT 4.0.评估先进人工智能作为一种工具在复发性/转移性头颈癌病例多学科肿瘤委员会决策中的作用——关于ChatGPT 4o的首项研究及与ChatGPT 4.0的比较。
Front Oncol. 2024 Sep 5;14:1455413. doi: 10.3389/fonc.2024.1455413. eCollection 2024.
3
The interaction of structured data using openEHR and large Language models for clinical decision support in prostate cancer.使用openEHR结构化数据与大语言模型在前列腺癌临床决策支持中的交互。
World J Urol. 2025 Jan 13;43(1):67. doi: 10.1007/s00345-024-05423-1.
4
ChatGPT-4o outperforms gemini advanced in assisting multidisciplinary decision-making for advanced gastric cancer.ChatGPT-4o在协助晚期胃癌的多学科决策方面优于Gemini Advanced。
Eur J Surg Oncol. 2025 Apr 24;51(8):110096. doi: 10.1016/j.ejso.2025.110096.
5
Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison.大语言模型在白内障护理信息提供方面的评估:定量比较
Ophthalmol Ther. 2025 Jan;14(1):103-116. doi: 10.1007/s40123-024-01066-y. Epub 2024 Nov 8.
6
Assessing the use of the novel tool Claude 3 in comparison to ChatGPT 4.0 as an artificial intelligence tool in the diagnosis and therapy of primary head and neck cancer cases.评估新型工具 Claude 3 与 ChatGPT 4.0 作为原发性头颈部癌症病例诊断和治疗的人工智能工具的使用情况。
Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6099-6109. doi: 10.1007/s00405-024-08828-1. Epub 2024 Aug 7.
7
Chasing sleep physicians: ChatGPT-4o on the interpretation of polysomnographic results.追寻睡眠医学专家:ChatGPT-4o对多导睡眠图结果的解读
Eur Arch Otorhinolaryngol. 2025 Mar;282(3):1631-1639. doi: 10.1007/s00405-024-08985-3. Epub 2024 Oct 20.
8
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.ChatGPT与医学顾问的对比:对耳鼻喉科基于病例问题回答的盲法评估
JMIR Med Educ. 2023 Dec 5;9:e49183. doi: 10.2196/49183.
9
Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties.评估ChatGPT-4o作为多学科肉瘤肿瘤委员会决策支持工具的效果:各专业表现参差不齐
Front Oncol. 2025 Jan 17;14:1526288. doi: 10.3389/fonc.2024.1526288. eCollection 2024.
10
Benchmarking LLM chatbots' oncological knowledge with the Turkish Society of Medical Oncology's annual board examination questions.用土耳其医学肿瘤学会年度委员会考试问题对大型语言模型聊天机器人的肿瘤学知识进行基准测试。
BMC Cancer. 2025 Feb 4;25(1):197. doi: 10.1186/s12885-025-13596-0.

引用本文的文献

1
ChatGPT versus DeepSeek in head and neck cancer staging and treatment planning: guideline-based study.ChatGPT与DeepSeek在头颈癌分期及治疗规划中的比较:基于指南的研究
Eur Arch Otorhinolaryngol. 2025 Jun 17. doi: 10.1007/s00405-025-09524-4.

本文引用的文献

1
Assessing the use of the novel tool Claude 3 in comparison to ChatGPT 4.0 as an artificial intelligence tool in the diagnosis and therapy of primary head and neck cancer cases.评估新型工具 Claude 3 与 ChatGPT 4.0 作为原发性头颈部癌症病例诊断和治疗的人工智能工具的使用情况。
Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6099-6109. doi: 10.1007/s00405-024-08828-1. Epub 2024 Aug 7.
2
Influence of believed AI involvement on the perception of digital medical advice.相信人工智能参与对数字医疗建议感知的影响。
Nat Med. 2024 Nov;30(11):3098-3100. doi: 10.1038/s41591-024-03180-7. Epub 2024 Jul 25.
3
Clinical and Surgical Applications of Large Language Models: A Systematic Review.
大语言模型的临床与外科应用:一项系统综述
J Clin Med. 2024 May 22;13(11):3041. doi: 10.3390/jcm13113041.
4
Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for primary head and neck cancer cases.评估先进人工智能作为一种工具在多学科肿瘤委员会针对原发性头颈癌病例进行决策中的作用。
Front Oncol. 2024 May 24;14:1353031. doi: 10.3389/fonc.2024.1353031. eCollection 2024.
5
[Large language models from OpenAI, Google, Meta, X and Co. : The role of "closed" and "open" models in radiology].[来自OpenAI、谷歌、Meta、X及其他公司的大语言模型:“封闭”和“开放”模型在放射学中的作用]
Radiologie (Heidelb). 2024 Oct;64(10):779-786. doi: 10.1007/s00117-024-01327-8. Epub 2024 Jun 7.
6
Assessing unknown potential-quality and limitations of different large language models in the field of otorhinolaryngology.评估耳鼻喉科领域不同大型语言模型的未知潜在质量和局限性。
Acta Otolaryngol. 2024 Mar;144(3):237-242. doi: 10.1080/00016489.2024.2352843. Epub 2024 May 23.
7
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.ChatGPT与医学顾问的对比:对耳鼻喉科基于病例问题回答的盲法评估
JMIR Med Educ. 2023 Dec 5;9:e49183. doi: 10.2196/49183.
8
Accuracy of ChatGPT in head and neck oncological board decisions: preliminary findings.ChatGPT在头颈肿瘤学委员会决策中的准确性:初步研究结果。
Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2105-2114. doi: 10.1007/s00405-023-08326-w. Epub 2023 Nov 22.
9
Challenging ChatGPT 3.5 in Senology-An Assessment of Concordance with Breast Cancer Tumor Board Decision Making.在乳腺病学中挑战ChatGPT 3.5——与乳腺癌肿瘤委员会决策的一致性评估
J Pers Med. 2023 Oct 16;13(10):1502. doi: 10.3390/jpm13101502.
10
The imperative for regulatory oversight of large language models (or generative AI) in healthcare.对医疗保健领域的大语言模型(或生成式人工智能)进行监管监督的必要性。
NPJ Digit Med. 2023 Jul 6;6(1):120. doi: 10.1038/s41746-023-00873-0.