Ammo Tekoshin, Guillaume Vincent G J, Hofmann Ulf Krister, Ulmer Norma M, Buenting Nina, Laenger Florian, Beier Justus P, Leypold Tim
Department of Plastic Surgery, Hand and Reconstructive Surgery, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, Germany.
Department of Orthopedics, Trauma and Reconstructive Surgery, Division of Arthroplasty, University Hospital Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Aachen, Germany.
Front Oncol. 2025 Jan 17;14:1526288. doi: 10.3389/fonc.2024.1526288. eCollection 2024.
Since the launch of ChatGPT in 2023, large language models have attracted substantial interest to be deployed in the health care sector. This study evaluates the performance of ChatGPT-4o as a support tool for decision-making in multidisciplinary sarcoma tumor boards.
We created five sarcoma patient cases mimicking real-world scenarios and prompted ChatGPT-4o to issue tumor board decisions. These recommendations were independently assessed by a multidisciplinary panel, consisting of an orthopedic surgeon, plastic surgeon, radiation oncologist, radiologist, and pathologist. Assessments were graded on a Likert scale from 1 (completely disagree) to 5 (completely agree) across five categories: understanding, therapy/diagnostic recommendation, aftercare recommendation, summarization, and support tool effectiveness.
The mean score for ChatGPT-4o performance was 3.76, indicating moderate effectiveness. Surgical specialties received the highest score, with a mean score of 4.48, while diagnostic specialties (radiology/pathology) performed considerably better than the radiation oncology specialty, which performed poorly.
This study provides initial insights into the use of prompt-engineered large language models as decision support tools in sarcoma tumor boards. ChatGPT-4o recommendations regarding surgical specialties performed best while ChatGPT-4o struggled to give valuable advice in the other tested specialties. Clinicians should understand both the advantages and limitations of this technology for effective integration into clinical practice.
自2023年ChatGPT推出以来,大语言模型在医疗保健领域的应用引起了广泛关注。本研究评估了ChatGPT-4o作为多学科肉瘤肿瘤委员会决策支持工具的性能。
我们创建了五个模拟现实场景的肉瘤患者病例,并促使ChatGPT-4o做出肿瘤委员会决策。这些建议由一个多学科小组独立评估,该小组由一名骨科医生、一名整形外科医生、一名放射肿瘤学家、一名放射科医生和一名病理学家组成。评估按照李克特量表从1(完全不同意)到5(完全同意)分为五类:理解、治疗/诊断建议、术后护理建议、总结和支持工具有效性。
ChatGPT-4o性能的平均得分为3.76,表明其有效性中等。外科专业得分最高,平均分为4.48,而诊断专业(放射科/病理科)的表现明显优于放射肿瘤学专业,放射肿瘤学专业表现较差。
本研究为在肉瘤肿瘤委员会中使用提示工程化大语言模型作为决策支持工具提供了初步见解。ChatGPT-4o关于外科专业的建议表现最佳,而ChatGPT-4o在其他测试专业中难以给出有价值的建议。临床医生应了解该技术的优缺点,以便有效地将其整合到临床实践中。