评估ChatGPT-4o作为多学科肉瘤肿瘤委员会决策支持工具的效果：各专业表现参差不齐

Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties.

作者信息

Ammo Tekoshin, Guillaume Vincent G J, Hofmann Ulf Krister, Ulmer Norma M, Buenting Nina, Laenger Florian, Beier Justus P, Leypold Tim

机构信息

Department of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, Germany.

Department of Orthopedics, Trauma and Reconstructive Surgery, Division of Arthroplasty, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, Germany.

出版信息

Front Oncol. 2025 Jan 17;14:1526288. doi: 10.3389/fonc.2024.1526288. eCollection 2024.

DOI:10.3389/fonc.2024.1526288

PMID:39896191

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11782276/

Abstract

BACKGROUND AND OBJECTIVES

Since the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.

METHODS

We created five sarcoma patient cases mimicking real-world scenarios and prompted ChatGPT-4o to issue tumor board decisions. These recommendations were independently assessed by a multidisciplinary panel, consisting of an orthopedic surgeon, plastic surgeon, radiation oncologist, radiologist, and pathologist. Assessments were graded on a Likert scale from 1 (completely disagree) to 5 (completely agree) across five categories: understanding, therapy/diagnostic recommendation, aftercare recommendation, summarization, and support tool effectiveness.

RESULTS

The mean score for ChatGPT-4o performance was 3.76, indicating moderate effectiveness. Surgical specialties received the highest score, with a mean score of 4.48, while diagnostic specialties (radiology/pathology) performed considerably better than the radiation oncology specialty, which performed poorly.

CONCLUSIONS

This study provides initial insights into the use of prompt-engineered large language models as decision support tools in sarcoma tumor boards. ChatGPT-4o recommendations regarding surgical specialties performed best while ChatGPT-4o struggled to give valuable advice in the other tested specialties. Clinicians should understand both the advantages and limitations of this technology for effective integration into clinical practice.

摘要

背景与目的

自2023年ChatGPT推出以来，大语言模型在医疗保健领域的应用引起了广泛关注。本研究评估了ChatGPT-4o作为多学科肉瘤肿瘤委员会决策支持工具的性能。

方法

我们创建了五个模拟现实场景的肉瘤患者病例，并促使ChatGPT-4o做出肿瘤委员会决策。这些建议由一个多学科小组独立评估，该小组由一名骨科医生、一名整形外科医生、一名放射肿瘤学家、一名放射科医生和一名病理学家组成。评估按照李克特量表从1（完全不同意）到5（完全同意）分为五类：理解、治疗/诊断建议、术后护理建议、总结和支持工具有效性。

结果

ChatGPT-4o性能的平均得分为3.76，表明其有效性中等。外科专业得分最高，平均分为4.48，而诊断专业（放射科/病理科）的表现明显优于放射肿瘤学专业，放射肿瘤学专业表现较差。

结论

本研究为在肉瘤肿瘤委员会中使用提示工程化大语言模型作为决策支持工具提供了初步见解。ChatGPT-4o关于外科专业的建议表现最佳，而ChatGPT-4o在其他测试专业中难以给出有价值的建议。临床医生应了解该技术的优缺点，以便有效地将其整合到临床实践中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d73/11782276/56eca7684b55/fonc-14-1526288-g001.jpg

相似文献

Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties.评估ChatGPT-4o作为多学科肉瘤肿瘤委员会决策支持工具的效果：各专业表现参差不齐

Front Oncol. 2025 Jan 17;14:1526288. doi: 10.3389/fonc.2024.1526288. eCollection 2024.

Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery.在耳鼻喉科、头颈外科中，评估本地运行和基于网络的大语言模型与人类委员会建议的决策情况。

Eur Arch Otorhinolaryngol. 2025 Mar;282(3):1593-1607. doi: 10.1007/s00405-024-09153-3. Epub 2025 Jan 10.

Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for recurrent/metastatic head and neck cancer cases - the first study on ChatGPT 4o and a comparison to ChatGPT 4.0.评估先进人工智能作为一种工具在复发性/转移性头颈癌病例多学科肿瘤委员会决策中的作用——关于ChatGPT 4o的首项研究及与ChatGPT 4.0的比较。

Front Oncol. 2024 Sep 5;14:1455413. doi: 10.3389/fonc.2024.1455413. eCollection 2024.

Assessing ChatGPT for Clinical Decision-Making in Radiation Oncology, With Open-Ended Questions and Images.通过开放式问题和图像评估ChatGPT在放射肿瘤学临床决策中的应用

Pract Radiat Oncol. 2025 Apr 29. doi: 10.1016/j.prro.2025.04.009.

ChatGPT-4o outperforms gemini advanced in assisting multidisciplinary decision-making for advanced gastric cancer.ChatGPT-4o在协助晚期胃癌的多学科决策方面优于Gemini Advanced。

Eur J Surg Oncol. 2025 Apr 24;51(8):110096. doi: 10.1016/j.ejso.2025.110096.

Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.ChatGPT-4o与Gemini在放射诊断学培训考试中的性能对比分析

Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能：评估 Google Gemini 和 ChatGPT-4o。

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Chasing sleep physicians: ChatGPT-4o on the interpretation of polysomnographic results.追寻睡眠医学专家：ChatGPT-4o对多导睡眠图结果的解读

Eur Arch Otorhinolaryngol. 2025 Mar;282(3):1631-1639. doi: 10.1007/s00405-024-08985-3. Epub 2024 Oct 20.

Can the large language model ChatGPT-4omni predict outcomes in adult patients with status epilepticus?大语言模型ChatGPT-4omni能否预测成人癫痫持续状态患者的预后？

Epilepsia. 2025 Mar;66(3):674-685. doi: 10.1111/epi.18215. Epub 2024 Dec 26.

Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。

Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.

引用本文的文献

The imitation game: large language models versus multidisciplinary tumor boards: benchmarking AI against 21 sarcoma centers from the ring trial.模仿游戏：大语言模型与多学科肿瘤专家委员会：将人工智能与环试验中的21个肉瘤中心进行对比测试

J Cancer Res Clin Oncol. 2025 Sep 10;151(9):248. doi: 10.1007/s00432-025-06304-9.

Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.在医疗保健中应用大语言模型：以临床医生为重点的回顾与交互式指南

J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.

The Role of Artificial Intelligence (ChatGPT-4o) in Supporting Tumor Board Decisions.人工智能（ChatGPT-4o）在辅助肿瘤专家委员会决策中的作用

J Clin Med. 2025 May 18;14(10):3535. doi: 10.3390/jcm14103535.

本文引用的文献

Comparing ChatGPT-3.5 and ChatGPT-4's alignments with the German evidence-based S3 guideline for adult soft tissue sarcoma.比较ChatGPT-3.5和ChatGPT-4与德国成人软组织肉瘤循证S3指南的一致性。

iScience. 2024 Nov 28;27(12):111493. doi: 10.1016/j.isci.2024.111493. eCollection 2024 Dec 20.

Evaluating the Alignment of Artificial Intelligence-Generated Recommendations With Clinical Guidelines Focused on Soft Tissue Tumors.评估人工智能生成的建议与聚焦软组织肿瘤的临床指南的一致性。

J Surg Oncol. 2025 Feb;131(2):285-290. doi: 10.1002/jso.27874. Epub 2024 Sep 5.

Assessing the use of the novel tool Claude 3 in comparison to ChatGPT 4.0 as an artificial intelligence tool in the diagnosis and therapy of primary head and neck cancer cases.评估新型工具 Claude 3 与 ChatGPT 4.0 作为原发性头颈部癌症病例诊断和治疗的人工智能工具的使用情况。

Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6099-6109. doi: 10.1007/s00405-024-08828-1. Epub 2024 Aug 7.

Artificial Intelligence-Powered Hand Surgery Consultation: GPT-4 as an Assistant in a Hand Surgery Outpatient Clinic.人工智能助力的手部手术咨询：GPT-4 在手部外科门诊中的助手作用。

J Hand Surg Am. 2024 Nov;49(11):1078-1088. doi: 10.1016/j.jhsa.2024.06.002. Epub 2024 Jul 26.

Integrating AI in Lipedema Management: Assessing the Efficacy of GPT-4 as a Consultation Assistant.将人工智能整合到脂肪性水肿管理中：评估GPT-4作为会诊助手的疗效。

Life (Basel). 2024 May 20;14(5):646. doi: 10.3390/life14050646.

ChatGPT's Gastrointestinal Tumor Board Tango: A limping dance partner?ChatGPT 的胃肠道肿瘤会诊：一个步履蹒跚的舞伴？

Eur J Cancer. 2024 Jul;205:114100. doi: 10.1016/j.ejca.2024.114100. Epub 2024 May 7.

Artificial intelligence large language model ChatGPT: is it a trustworthy and reliable source of information for sarcoma patients?人工智能大语言模型 ChatGPT：它是肉瘤患者值得信赖和可靠的信息来源吗？

Front Public Health. 2024 Mar 22;12:1303319. doi: 10.3389/fpubh.2024.1303319. eCollection 2024.

The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education.ChatGPT在骨科在职培训考试中的表现：GPT-3.5 turbo和GPT-4模型在骨科教育中的比较研究。

J Orthop. 2023 Nov 23;50:70-75. doi: 10.1016/j.jor.2023.11.056. eCollection 2024 Apr.

Can AI Think Like a Plastic Surgeon? Evaluating GPT-4's Clinical Judgment in Reconstructive Procedures of the Upper Extremity.人工智能能像整形外科医生一样思考吗？评估GPT-4在上肢重建手术中的临床判断力。

Plast Reconstr Surg Glob Open. 2023 Dec 13;11(12):e5471. doi: 10.1097/GOX.0000000000005471. eCollection 2023 Dec.

ChatGPT versus clinician: challenging the diagnostic capabilities of artificial intelligence in dermatology.ChatGPT 与临床医生：人工智能在皮肤病学诊断能力方面的挑战。

Clin Exp Dermatol. 2024 Jun 25;49(7):707-710. doi: 10.1093/ced/llad402.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT-4o作为多学科肉瘤肿瘤委员会决策支持工具的效果：各专业表现参差不齐

Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties.

作者信息

机构信息

出版信息

BACKGROUND AND OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

背景与目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献