Tegaw Eyachew Misganew, Asfaw Betelhem Bizuneh
Department of Physics, College of Natural and Computational Sciences, Debre Tabor University, Debre Tabor, Ethiopia.
Department of Health System Management and Health Economics, School of Public Health, College of Medicine and Health Sciences, Bahir Dar University, Bahir Dar, Ethiopia.
Semin Oncol. 2025 Jun;52(3):152364. doi: 10.1016/j.seminoncol.2025.152364. Epub 2025 May 24.
The treatment outcomes of lung cancer are highly variable, and machine learning (ML) models provide valuable insights into how clinical and biochemical factors influence survival across different treatments. This study will investigate the survival of patients after four major treatments for lung cancer by interpreting the impact of biomarkers on survival using SHapley Additive exPlanations (SHAP). We analyzed 23,658 lung cancer patient records derived from a Kaggle dataset. Using the most relevant clinical and biochemical variables, ML models were employed to study survival outcomes for different treatments. SHAP analysis revealed major survival predictors in each treatment. Survival outcomes are visualized as f(x) (predicted survival) and E[f(x)] (baseline expectation) in SHAP waterfall plots. The most performed model is Gradient Boosting with an accuracy of 88.99%, precision of 89.06%, recall of 88.99%, F1-score of 88.91%, and Receiver Operating Characteristic Curve (AUC-ROC) score of 0.9332. Chemotherapy treatment was positive for survival, the key for survival was phosphorus levels (+0.05), low Alanine Aminotransferase levels (+0.04) and low glucose levels (+0.04). Targeted therapy and radiation had worse survival, while surgery was favorable, especially in cases with high white blood cell and Lactate Dehydrogenase (LDH) levels. SHAP-based ML analysis aptly underlines how clinical and biochemical factors influence the survival rate. It indicates that ML-driven interpretability might drive personalized treatment approaches in lung cancer.
肺癌的治疗结果差异很大,机器学习(ML)模型为临床和生化因素如何影响不同治疗方式下的生存率提供了有价值的见解。本研究将通过使用夏普利值加法解释(SHAP)来解读生物标志物对生存率的影响,从而调查肺癌四种主要治疗方法后患者的生存情况。我们分析了来自Kaggle数据集的23658例肺癌患者记录。使用最相关的临床和生化变量,采用ML模型研究不同治疗方法的生存结果。SHAP分析揭示了每种治疗方法中的主要生存预测因素。在SHAP瀑布图中,生存结果可视化为f(x)(预测生存率)和E[f(x)](基线预期)。表现最佳的模型是梯度提升模型,其准确率为88.99%,精确率为89.06%,召回率为88.99%,F1分数为88.91%,受试者工作特征曲线(AUC-ROC)分数为0.9332。化疗对生存有积极影响,生存的关键因素是磷水平(+0.05)、低丙氨酸转氨酶水平(+0.04)和低葡萄糖水平(+0.04)。靶向治疗和放疗的生存率较差,而手术效果良好,尤其是在白细胞和乳酸脱氢酶(LDH)水平较高的情况下。基于SHAP的ML分析恰当地强调了临床和生化因素如何影响生存率。这表明ML驱动的可解释性可能推动肺癌的个性化治疗方法。