Ding Qunzhe, Wang Chendong, Zhang Zhe, Liao Junjie, Tang Lufan, Lu Jiade Jay, Tan Zhibo
School of Information Management, Wuhan University, Wuhan, China.
Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
Transl Lung Cancer Res. 2025 Jun 30;14(6):2011-2030. doi: 10.21037/tlcr-2025-152. Epub 2025 Jun 26.
For individual patients with early-stage non-small cell lung cancer (NSCLC), robust evidence to guide treatment selection between surgery and stereotactic body radiotherapy (SBRT) remains limited. This study aimed to develop machine learning-driven predictive models using the Surveillance, Epidemiology, and End Results (SEER) database to evaluate the efficacy of these treatments, thereby providing a data-driven foundation for personalized treatment decisions.
Stage I or IIA NSCLC patients diagnosed between 2012 and 2018 were identified from the SEER database. Six machine learning models, spanning from classical to advanced approaches, were employed to predict 1-, 3-, and 5-year survival, with their performance assessed using seven metrics. The SHAP (SHapley Additive exPlanations) interpretability method was employed to explain the optimal predictive model, focusing on analyzing the differences between surgical and radiotherapy treatments under various factors, providing valuable insights to optimizing treatment strategies. Patients diagnosed between 2019 and 2021 were selected as an external validation cohort to assess the generalizability and robustness of the previously developed models.
A total of 26,566 patients were included in the training and internal testing cohort of the study. LightGBM (light gradient boosting machine) outperformed other models across most metrics for survival predictions. The SHAP interpretability analysis revealed that tumor location, tumor size, pathology, and treatment type were significant factors for 3- and 5-year predictions. Furthermore, at 3- and 5-year intervals, the efficacy of radiotherapy was comparable to surgery for left upper lobe tumors, while radiotherapy appeared slightly inferior to surgery for right lower lobe tumors. Meanwhile, for tumors <1.5 cm or 3.5-5 cm, lobectomy exhibited the best efficacy, while for tumors measuring 1.5-3.5 cm, the efficacy of lobectomy seemed to be slightly inferior to radiotherapy and sublobar resection. For adenocarcinoma and squamous cell carcinoma, radiotherapy and lobectomy could be regarded as the preferred treatment methods, respectively. Besides, for patients <45 or >75 years old, sublobar resection showed the best efficacy at the 5-year interval. The external validation cohort of 11,927 patients further confirmed the effectiveness of the models in predicting 1-, 3-, and 5-year survival outcomes, reinforcing their reliability and applicability in clinical decision-making.
This study provides valuable insights into treatment decision-making for stages I and IIA NSCLC. The LightGBM model is a reliable tool for survival prediction for early-stage NSCLC. By utilizing this model, it can be concluded that tumor location, tumor size, pathological type and age are vital factors significantly influencing the choice of treatment methods.
对于早期非小细胞肺癌(NSCLC)患者,指导手术和立体定向体部放疗(SBRT)之间治疗选择的有力证据仍然有限。本研究旨在利用监测、流行病学和最终结果(SEER)数据库开发机器学习驱动的预测模型,以评估这些治疗的疗效,从而为个性化治疗决策提供数据驱动的基础。
从SEER数据库中识别出2012年至2018年期间诊断为I期或IIA期NSCLC的患者。采用六种机器学习模型,涵盖从经典到先进的方法,预测1年、3年和5年生存率,并使用七种指标评估其性能。采用SHAP(SHapley加性解释)可解释性方法来解释最优预测模型,重点分析各种因素下手术和放疗治疗之间的差异,为优化治疗策略提供有价值的见解。选择2019年至2021年期间诊断的患者作为外部验证队列,以评估先前开发模型的通用性和稳健性。
本研究的训练和内部测试队列共纳入26566例患者。在大多数生存预测指标上,LightGBM(轻梯度提升机)优于其他模型。SHAP可解释性分析表明,肿瘤位置、肿瘤大小、病理和治疗类型是3年和5年预测的重要因素。此外,在3年和5年时,左上叶肿瘤放疗的疗效与手术相当,而右下叶肿瘤放疗的疗效似乎略逊于手术。同时,对于<1.5 cm或3.5 - 5 cm的肿瘤,肺叶切除术疗效最佳,而对于1.5 - 3.5 cm的肿瘤,肺叶切除术的疗效似乎略逊于放疗和肺段切除术。对于腺癌和鳞状细胞癌患者,放疗和肺叶切除术可分别视为首选治疗方法。此外,对于<45岁或>75岁的患者,肺段切除术在5年时疗效最佳。11927例患者的外部验证队列进一步证实了模型在预测1年、3年和5年生存结局方面的有效性,增强了其在临床决策中的可靠性和适用性。
本研究为I期和IIA期NSCLC的治疗决策提供了有价值的见解。LightGBM模型是早期NSCLC生存预测的可靠工具。利用该模型可以得出结论,肿瘤位置、肿瘤大小、病理类型和年龄是显著影响治疗方法选择的重要因素。