Schmidl Benedikt, Hütten Tobias, Pigorsch Steffi, Stögbauer Fabian, Hoch Cosima C, Hussain Timon, Wollenberg Barbara, Wirth Markus
Department of Otolaryngology Head and Neck Surgery, Technical University Munich, Munich, Germany.
Department of RadioOncology, Technical University Munich, Munich, Germany.
Front Oncol. 2024 Sep 5;14:1455413. doi: 10.3389/fonc.2024.1455413. eCollection 2024.
Recurrent and metastatic head and neck squamous cell carcinoma (HNSCC) is characterized by a complex therapeutic management that needs to be discussed in multidisciplinary tumor boards (MDT). While artificial intelligence (AI) improved significantly to assist healthcare professionals in making informed treatment decisions for primary cases, an application in the even more complex recurrent/metastatic setting has not been evaluated yet. This study also represents the first evaluation of the recently published LLM ChatGPT 4o, compared to ChatGPT 4.0 for providing therapy recommendations.
The therapy recommendations for 100 HNSCC cases generated by each LLM, 50 cases of recurrence and 50 cases of distant metastasis were evaluated by two independent reviewers. The primary outcome measured was the quality of the therapy recommendations measured by the following parameters: clinical recommendation, explanation, and summarization.
In this study, ChatGPT 4o and 4.0 provided mostly general answers for surgery, palliative care, or systemic therapy. ChatGPT 4o proved to be 48.5% faster than ChatGPT 4.0. For clinical recommendation, explanation, and summarization both LLMs obtained high scores in terms of performance of therapy recommendations, with no significant differences between both LLMs, but demonstrated to be mostly an assisting tool, requiring validation by an experienced clinician due to a lack of transparency and sometimes recommending treatment modalities that are not part of the current treatment guidelines.
This research demonstrates that ChatGPT 4o and 4.0 share a similar performance, while ChatGPT 4o is significantly faster. Since the current versions cannot tailor therapy recommendations, and sometimes recommend incorrect treatment options and lack information on the source material, advanced AI models at the moment can merely assist in the MDT setting for recurrent/metastatic HNSCC.
复发性和转移性头颈部鳞状细胞癌(HNSCC)的治疗管理复杂,需要在多学科肿瘤委员会(MDT)中进行讨论。虽然人工智能(AI)已显著改进,可协助医疗保健专业人员为原发性病例做出明智的治疗决策,但在更为复杂的复发/转移情况下的应用尚未得到评估。本研究还首次对最近发布的大语言模型ChatGPT 4o与ChatGPT 4.0提供治疗建议进行了比较评估。
由每个大语言模型生成的100例HNSCC病例的治疗建议,包括50例复发和50例远处转移,由两名独立评审员进行评估。测量主要结果是通过以下参数衡量的治疗建议质量:临床建议、解释和总结。
在本研究中,ChatGPT 4o和4.0提供的大多是关于手术、姑息治疗或全身治疗的一般性答案。ChatGPT 4o被证明比ChatGPT 4.0快48.5%。对于临床建议、解释和总结,两个大语言模型在治疗建议性能方面均获得高分,两者之间无显著差异,但均显示主要是辅助工具,由于缺乏透明度且有时推荐不属于当前治疗指南的治疗方式,因此需要经验丰富的临床医生进行验证。
本研究表明ChatGPT 4o和4.0表现相似,而ChatGPT 4o速度明显更快。由于当前版本无法定制治疗建议,有时推荐错误的治疗选择且缺乏关于源材料的信息,目前先进的人工智能模型仅能在MDT环境中辅助复发性/转移性HNSCC的治疗。