Yan Danfang, Wang Lihong, Huang Liming, Cheng Kejia, Huang Yu, Bao Yangyang, Yin Xin, He Mengye, Zhu Huiyong, Yan Senxiang
Department of Radiation Oncology, the First Affiliated Hospital, College of Medicine, Zhejiang University, Zhejiang, Hangzhou, China.
Department of Oncology, The Affiliated People's Hospital, Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China.
Int J Cancer. 2025 Nov 1;157(9):1888-1897. doi: 10.1002/ijc.70001. Epub 2025 Jul 19.
This study evaluates ChatGPT-4's potential as a decision-support tool in the treatment of recurrent or metastatic head and neck squamous cell carcinoma (HNSCC). The study involved 12 retrospectively chosen patients with detailed clinical, tumor, treatment history, imaging, pathology, and symptomatic data. ChatGPT-4, along with six experts and 10 junior oncologists, assessed these cases. The AI model applied the 8th edition AJCC TNM criteria for tumor staging and proposed treatment strategies. Performance was quantitatively rated on a 0-100 scale by both expert and junior oncologists, with further analysis through statistical scoring and intraclass correlation coefficients. Findings revealed that ChatGPT-4 achieved an 83.3% accuracy rate in tumor staging with two instances of mis-staging. Junior doctors rated its staging performance highly, showing strong consensus on language capabilities and moderate on learning assistance. Experts rated ChatGPT-4's treatment strategy: high agreement on subject knowledge (median 86, mean 84.7), logical reasoning (median 83, mean 82), and analytical skills (median 85, mean 82); moderate on ChatGPT-4's usefulness for treatment decision (median 80, mean 77) and its recommendations (median 80, mean 76.8). Junior doctors rated ChatGPT-4 higher in treatment strategy (medians above 85) with limited consensus (subject knowledge: median 88, mean 84.5; logical reasoning: median 90, mean 83.2; analytical skills: median 90, mean 82.5; usefulness: median 85, mean 81.8; agreements for: median 85, mean 80.4). ChatGPT is proficient in tumor staging but moderately effective in treatment recommendations. Nonetheless, it shows promise as a supportive tool for clinicians, particularly for those with less experience, in making informed treatment decisions.
本研究评估了ChatGPT-4在复发性或转移性头颈部鳞状细胞癌(HNSCC)治疗中作为决策支持工具的潜力。该研究纳入了12例经回顾性选择的患者,他们具有详细的临床、肿瘤、治疗史、影像学、病理学和症状数据。ChatGPT-4与六位专家和十位初级肿瘤学家一起对这些病例进行了评估。人工智能模型应用第8版AJCC TNM标准进行肿瘤分期,并提出治疗策略。专家和初级肿瘤学家均以0至100分的量表对其表现进行定量评分,并通过统计评分和组内相关系数进行进一步分析。研究结果显示,ChatGPT-4在肿瘤分期方面的准确率达到83.3%,有两例分期错误。初级医生对其分期表现评价很高,在语言能力方面达成了强烈共识,在学习辅助方面达成了中等程度的共识。专家们对ChatGPT-4的治疗策略进行了评分:在主题知识方面高度一致(中位数86,平均数84.7)、逻辑推理方面(中位数83,平均数82)和分析技能方面(中位数85,平均数82);在ChatGPT-4对治疗决策的有用性方面(中位数80,平均数77)和其建议方面(中位数80,平均数76.8)达成中等程度的共识。初级医生对ChatGPT-4的治疗策略评分更高(中位数高于85),但共识有限(主题知识:中位数88,平均数84.5;逻辑推理:中位数90,平均数83.2;分析技能:中位数90,平均数82.5;有用性:中位数85,平均数81.8;建议的一致性:中位数85,平均数80.4)。ChatGPT在肿瘤分期方面表现熟练,但在治疗建议方面效果中等。尽管如此,它显示出有望成为临床医生,特别是经验较少的临床医生做出明智治疗决策的支持工具。