Schmidl Benedikt, Hoch Cosima C, Walter Robert, Wirth Markus, Wollenberg Barbara, Hussain Timon
Department of Otolaryngology Head and Neck Surgery, Technical University Munich, Munich, Germany.
Department of Diagnostic and Interventional Radiology, Technical University Munich, Munich, Germany.
Discov Oncol. 2025 May 30;16(1):956. doi: 10.1007/s12672-025-02798-4.
Accurate preoperative detection and analysis of lymph node metastasis (LNM) in head and neck squamous cell carcinoma (HNSCC) is essential for the surgical planning and execution of a neck dissection and may directly affect the morbidity and prognosis of patients. Additionally, predicting extranodal extension (ENE) using pre-operative imaging could be particularly valuable in oropharyngeal HPV-positive squamous cell carcinoma, enabling more accurate patient counseling, allowing the decision to favor primary chemoradiotherapy over immediate neck dissection when appropriate. Currently, radiological images are evaluated by radiologists and head and neck oncologists; and automated image interpretation is not part of the current standard of care. Therefore, the value of preoperative image recognition by artificial intelligence (AI) with the large language model (LLM) ChatGPT-4 V was evaluated in this exploratory study based on neck computed tomography (CT) images of HNSCC patients with cervical LNM, and corresponding images without LNM. The objective of this study was to firstly assess the preoperative rater accuracy by comparing clinician assessments of imaging-detected extranodal extension (iENE) and the extent of neck dissection to AI predictions, and secondly to evaluate the pathology-based accuracy by comparing AI predictions to final histopathological outcomes.
45 preoperative CT scans were retrospectively analyzed in this study: 15 cases in which a selective neck dissection (sND) was performed, 15 cases with ensuing radical neck dissection (mrND), and 15 cases without LNM (sND). Of note, image analysis was based on three single images provided to both ChatGPT-4 V and the head and neck surgeons as reviewers. Final pathological characteristics were available in all cases as HNSCC patients had undergone surgery. ChatGPT-4 V was tasked with providing the extent of LNM in the preoperative CT scans and with providing a recommendation for the extent of neck dissection and the detection of iENE. The diagnostic performance of ChatGPT-4 V was reviewed independently by two head and neck surgeons with its accuracy, sensitivity, and specificity being assessed.
In this study, ChatGPT-4 V reached a sensitivity of 100% and a specificity of 34.09% in identifying the need for a radical neck dissection based on neck CT images. The sensitivity and specificity of detecting iENE was 100% and 34.15%, respectively. Both human reviewers achieved higher specificity. Notably, ChatGPT-4 V also recommended a mrND and detected iENE on CT images without any cervical LNM.
In this exploratory study of 45 preoperative CT Neck scans before a neck dissection, ChatGPT-4 V substantially overestimated the degree and severity of lymph node metastasis in head and neck cancer. While these results suggest that ChatGPT-4 V may not yet be a tool providing added value for surgical planning in head and neck cancer, the unparalleled speed of analysis and well-founded reasoning provided suggests that AI tools may provide added value in the future.
对头颈部鳞状细胞癌(HNSCC)患者的淋巴结转移(LNM)进行准确的术前检测和分析,对于颈部清扫手术的规划和实施至关重要,可能直接影响患者的发病率和预后。此外,利用术前影像学预测结外侵犯(ENE)在口咽人乳头瘤病毒(HPV)阳性鳞状细胞癌中可能特别有价值,能够进行更准确的患者咨询,在适当情况下有助于决定优先选择原发灶放化疗而非立即进行颈部清扫。目前,放射影像由放射科医生和头颈肿瘤学家评估;自动图像解读并非当前的标准治疗手段。因此,在这项探索性研究中,基于有颈部LNM的HNSCC患者以及无LNM的相应颈部计算机断层扫描(CT)图像,评估了使用大语言模型(LLM)ChatGPT-4 V进行人工智能(AI)术前图像识别的价值。本研究的目的,一是通过比较临床医生对影像学检测到的结外侵犯(iENE)和颈部清扫范围的评估与AI预测,来评估术前评估者的准确性;二是通过比较AI预测与最终组织病理学结果,来评估基于病理学的准确性。
本研究回顾性分析了45例术前CT扫描:15例行选择性颈部清扫(sND),15例行随后的根治性颈部清扫(mrND),15例无LNM(sND)。值得注意的是,图像分析基于提供给ChatGPT-4 V和作为评估者的头颈外科医生的三张单幅图像。由于HNSCC患者均接受了手术,所有病例均有最终病理特征。ChatGPT-4 V的任务是提供术前CT扫描中的LNM范围,并就颈部清扫范围和iENE的检测提供建议。两名头颈外科医生独立评估ChatGPT-4 V的诊断性能,评估其准确性、敏感性和特异性。
在本研究中,ChatGPT-4 V基于颈部CT图像识别根治性颈部清扫需求的敏感性为100%,特异性为34.09%。检测iENE的敏感性和特异性分别为100%和34.15%。两位人类评估者的特异性更高。值得注意的是,ChatGPT-4 V还在无任何颈部LNM的CT图像上建议进行mrND并检测到iENE。
在这项对45例颈部清扫术前颈部CT扫描的探索性研究中,ChatGPT-4 V大幅高估了头颈癌淋巴结转移的程度和严重性。虽然这些结果表明ChatGPT-4 V可能尚未成为一种为头颈癌手术规划提供附加价值的工具,但它所提供的无与伦比的分析速度和有充分依据的推理表明,AI工具未来可能会提供附加价值。