Department of Urology, Shanghai Changhai Hospital, Naval Medical University, Shanghai, China.
Cancer Med. 2024 Jun;13(11):e7324. doi: 10.1002/cam4.7324.
We have developed explainable machine learning models to predict the overall survival (OS) of retroperitoneal liposarcoma (RLPS) patients. This approach aims to enhance the explainability and transparency of our modeling results.
We collected clinicopathological information of RLPS patients from The Surveillance, Epidemiology, and End Results (SEER) database and allocated them into training and validation sets with a 7:3 ratio. Simultaneously, we obtained an external validation cohort from The First Affiliated Hospital of Naval Medical University (Shanghai, China). We performed LASSO regression and multivariate Cox proportional hazards analysis to identify relevant risk factors, which were then combined to develop six machine learning (ML) models: Cox proportional hazards model (Coxph), random survival forest (RSF), ranger, gradient boosting with component-wise linear models (GBM), decision trees, and boosting trees. The predictive performance of these ML models was evaluated using the concordance index (C-index), the integrated cumulative/dynamic area under the curve (AUC), and the integrated Brier score, as well as the Cox-Snell residual plot. We also used time-dependent variable importance, analysis of partial dependence survival plots, and the generation of aggregated survival SHapley Additive exPlanations (SurvSHAP) plots to provide a global explanation of the optimal model. Additionally, SurvSHAP (t) and survival local interpretable model-agnostic explanations (SurvLIME) plots were used to provide a local explanation of the optimal model.
The final ML models are consisted of six factors: patient's age, gender, marital status, surgical history, as well as tumor's histopathological classification, histological grade, and SEER stage. Our prognostic model exhibits significant discriminative ability, particularly with the ranger model performing optimally. In the training set, validation set, and external validation set, the AUC for 1, 3, and 5 year OS are all above 0.83, and the integrated Brier scores are consistently below 0.15. The explainability analysis of the ranger model also indicates that histological grade, histopathological classification, and age are the most influential factors in predicting OS.
The ranger ML prognostic model exhibits optimal performance and can be utilized to predict the OS of RLPS patients, offering valuable and crucial references for clinical physicians to make informed decisions in advance.
我们开发了可解释的机器学习模型来预测腹膜后脂肪肉瘤(RLPS)患者的总生存期(OS)。这种方法旨在提高我们建模结果的可解释性和透明度。
我们从监测、流行病学和最终结果(SEER)数据库中收集了 RLPS 患者的临床病理信息,并将其按 7:3 的比例分配到训练集和验证集中。同时,我们从海军军医大学第一附属医院(上海)获得了一个外部验证队列。我们进行了 LASSO 回归和多变量 Cox 比例风险分析,以确定相关风险因素,然后将这些因素结合起来开发了 6 个机器学习(ML)模型:Cox 比例风险模型(Coxph)、随机生存森林(RSF)、ranger、梯度提升与组件线性模型(GBM)、决策树和提升树。使用一致性指数(C-index)、综合累积/动态曲线下面积(AUC)和综合 Brier 评分以及 Cox-Snell 残差图来评估这些 ML 模型的预测性能。我们还使用时间相关变量重要性、部分依赖生存图分析和聚合生存 SHapley 可加性解释(SurvSHAP)图来提供最优模型的全局解释。此外,还使用 SurvSHAP(t)和生存局部可解释模型不可知解释(SurvLIME)图来提供最优模型的局部解释。
最终的 ML 模型由六个因素组成:患者年龄、性别、婚姻状况、手术史以及肿瘤的组织病理学分类、组织学分级和 SEER 分期。我们的预后模型具有显著的区分能力,尤其是 ranger 模型表现最佳。在训练集、验证集和外部验证集中,1、3 和 5 年 OS 的 AUC 均高于 0.83,综合 Brier 评分始终低于 0.15。ranger 模型的可解释性分析还表明,组织学分级、组织病理学分类和年龄是预测 OS 的最具影响力的因素。
ranger ML 预后模型表现最佳,可用于预测 RLPS 患者的 OS,为临床医生提前做出明智决策提供有价值的关键参考。