可解释机器学习预测腹膜后脂肪肉瘤的生存：基于 SEER 数据库的研究和中国的外部验证。

Explainable machine learning predicts survival of retroperitoneal liposarcoma: A study based on the SEER database and external validation in China.

机构信息

Department of Urology, Shanghai Changhai Hospital, Naval Medical University, Shanghai, China.

出版信息

Cancer Med. 2024 Jun;13(11):e7324. doi: 10.1002/cam4.7324.

DOI:10.1002/cam4.7324

PMID:38847519

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11157677/

Abstract

OBJECTIVE

We have developed explainable machine learning models to predict the overall survival (OS) of retroperitoneal liposarcoma (RLPS) patients. This approach aims to enhance the explainability and transparency of our modeling results.

METHODS

We collected clinicopathological information of RLPS patients from The Surveillance, Epidemiology, and End Results (SEER) database and allocated them into training and validation sets with a 7:3 ratio. Simultaneously, we obtained an external validation cohort from The First Affiliated Hospital of Naval Medical University (Shanghai, China). We performed LASSO regression and multivariate Cox proportional hazards analysis to identify relevant risk factors, which were then combined to develop six machine learning (ML) models: Cox proportional hazards model (Coxph), random survival forest (RSF), ranger, gradient boosting with component-wise linear models (GBM), decision trees, and boosting trees. The predictive performance of these ML models was evaluated using the concordance index (C-index), the integrated cumulative/dynamic area under the curve (AUC), and the integrated Brier score, as well as the Cox-Snell residual plot. We also used time-dependent variable importance, analysis of partial dependence survival plots, and the generation of aggregated survival SHapley Additive exPlanations (SurvSHAP) plots to provide a global explanation of the optimal model. Additionally, SurvSHAP (t) and survival local interpretable model-agnostic explanations (SurvLIME) plots were used to provide a local explanation of the optimal model.

RESULTS

The final ML models are consisted of six factors: patient's age, gender, marital status, surgical history, as well as tumor's histopathological classification, histological grade, and SEER stage. Our prognostic model exhibits significant discriminative ability, particularly with the ranger model performing optimally. In the training set, validation set, and external validation set, the AUC for 1, 3, and 5 year OS are all above 0.83, and the integrated Brier scores are consistently below 0.15. The explainability analysis of the ranger model also indicates that histological grade, histopathological classification, and age are the most influential factors in predicting OS.

CONCLUSIONS

The ranger ML prognostic model exhibits optimal performance and can be utilized to predict the OS of RLPS patients, offering valuable and crucial references for clinical physicians to make informed decisions in advance.

摘要

目的

我们开发了可解释的机器学习模型来预测腹膜后脂肪肉瘤（RLPS）患者的总生存期（OS）。这种方法旨在提高我们建模结果的可解释性和透明度。

方法

我们从监测、流行病学和最终结果（SEER）数据库中收集了 RLPS 患者的临床病理信息，并将其按 7:3 的比例分配到训练集和验证集中。同时，我们从海军军医大学第一附属医院（上海）获得了一个外部验证队列。我们进行了 LASSO 回归和多变量 Cox 比例风险分析，以确定相关风险因素，然后将这些因素结合起来开发了 6 个机器学习（ML）模型：Cox 比例风险模型（Coxph）、随机生存森林（RSF）、ranger、梯度提升与组件线性模型（GBM）、决策树和提升树。使用一致性指数（C-index）、综合累积/动态曲线下面积（AUC）和综合 Brier 评分以及 Cox-Snell 残差图来评估这些 ML 模型的预测性能。我们还使用时间相关变量重要性、部分依赖生存图分析和聚合生存 SHapley 可加性解释（SurvSHAP）图来提供最优模型的全局解释。此外，还使用 SurvSHAP（t）和生存局部可解释模型不可知解释（SurvLIME）图来提供最优模型的局部解释。

结果

最终的 ML 模型由六个因素组成：患者年龄、性别、婚姻状况、手术史以及肿瘤的组织病理学分类、组织学分级和 SEER 分期。我们的预后模型具有显著的区分能力，尤其是 ranger 模型表现最佳。在训练集、验证集和外部验证集中，1、3 和 5 年 OS 的 AUC 均高于 0.83，综合 Brier 评分始终低于 0.15。ranger 模型的可解释性分析还表明，组织学分级、组织病理学分类和年龄是预测 OS 的最具影响力的因素。

结论

ranger ML 预后模型表现最佳，可用于预测 RLPS 患者的 OS，为临床医生提前做出明智决策提供有价值的关键参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa79/11157677/8ae2f25f945f/CAM4-13-e7324-g001.jpg

相似文献

Explainable machine learning predicts survival of retroperitoneal liposarcoma: A study based on the SEER database and external validation in China.

Cancer Med. 2024 Jun;13(11):e7324. doi: 10.1002/cam4.7324.

Deep learning models for predicting the survival of patients with hepatocellular carcinoma based on a surveillance, epidemiology, and end results (SEER) database analysis.

Sci Rep. 2024 Jun 9;14(1):13232. doi: 10.1038/s41598-024-63531-9.

Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study.

J Med Internet Res. 2023 Oct 26;25:e44417. doi: 10.2196/44417.

Comparison and use of explainable machine learning-based survival models for heart failure patients.

Digit Health. 2024 Aug 25;10:20552076241277027. doi: 10.1177/20552076241277027. eCollection 2024 Jan-Dec.

Deep learning model for predicting the survival of patients with primary gastrointestinal lymphoma based on the SEER database and a multicentre external validation cohort.

J Cancer Res Clin Oncol. 2023 Oct;149(13):12177-12189. doi: 10.1007/s00432-023-05123-0. Epub 2023 Jul 10.

Survival prediction in second primary breast cancer patients with machine learning: An analysis of SEER database.

Comput Methods Programs Biomed. 2024 Sep;254:108310. doi: 10.1016/j.cmpb.2024.108310. Epub 2024 Jun 25.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Automated machine learning for predicting liver metastasis in patients with gastrointestinal stromal tumor: a SEER-based analysis.

Sci Rep. 2024 May 30;14(1):12415. doi: 10.1038/s41598-024-62311-9.

Machine learning-based individualized survival prediction model for prognosis in osteosarcoma: Data from the SEER database.

Medicine (Baltimore). 2024 Sep 27;103(39):e39582. doi: 10.1097/MD.0000000000039582.

Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study.

JMIR Aging. 2024 Jul 26;7:e54872. doi: 10.2196/54872.

引用本文的文献

Machine learning and deep learning to improve overall survival prediction in cervical cancer patients.

Transl Cancer Res. 2025 May 30;14(5):3057-3068. doi: 10.21037/tcr-2024-2304. Epub 2025 May 26.

Comprehensive conditional survival analysis of pancreatic signet ring cell carcinoma: chemotherapy's role and predictive model development using the SEER database.

Discov Oncol. 2025 Jun 12;16(1):1074. doi: 10.1007/s12672-025-02946-w.

Machine learning model reveals the risk, prognosis, and drug response of histamine-related signatures in pancreatic cancer.

Discov Oncol. 2025 Feb 11;16(1):155. doi: 10.1007/s12672-025-01910-y.

本文引用的文献

A Radiomic-Based Machine Learning Model Predicts Endometrial Cancer Recurrence Using Preoperative CT Radiomic Features: A Pilot Study.

Cancers (Basel). 2023 Sep 13;15(18):4534. doi: 10.3390/cancers15184534.

Machine learning‑based prediction of survival prognosis in esophageal squamous cell carcinoma.

Sci Rep. 2023 Aug 19;13(1):13532. doi: 10.1038/s41598-023-40780-8.

A Random Forest Model for Post-Treatment Survival Prediction in Patients with Non-Squamous Cell Carcinoma of the Head and Neck.

J Clin Med. 2023 Jul 30;12(15):5015. doi: 10.3390/jcm12155015.

Survival of a patient with five recurrences of retroperitoneal liposarcoma over a period of 13 years: A case report and review.

Oncol Lett. 2023 Jul 12;26(3):367. doi: 10.3892/ol.2023.13953. eCollection 2023 Sep.

Prognostic Models Using Machine Learning Algorithms and Treatment Outcomes of Occult Breast Cancer Patients.

J Clin Med. 2023 Apr 24;12(9):3097. doi: 10.3390/jcm12093097.

Explainable AI for clinical and remote health applications: a survey on tabular and time series data.

Artif Intell Rev. 2023;56(6):5261-5315. doi: 10.1007/s10462-022-10304-3. Epub 2022 Oct 26.

Machine Learning Predict Survivals of Spinal and Pelvic Ewing's Sarcoma with the SEER Database.

Global Spine J. 2024 May;14(4):1125-1136. doi: 10.1177/21925682221134049. Epub 2022 Oct 25.

Machine Learning Risk Prediction Model of 90-day Mortality After Gastrectomy for Cancer.

Ann Surg. 2022 Nov 1;276(5):776-783. doi: 10.1097/SLA.0000000000005616. Epub 2022 Jul 22.

Predicting recurrence and recurrence-free survival in high-grade endometrial cancer using machine learning.

J Surg Oncol. 2022 Nov;126(6):1096-1103. doi: 10.1002/jso.27008. Epub 2022 Jul 12.

The 2021 landscape of FDA-approved artificial intelligence/machine learning-enabled medical devices: An analysis of the characteristics and intended use.

Int J Med Inform. 2022 Sep;165:104828. doi: 10.1016/j.ijmedinf.2022.104828. Epub 2022 Jul 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

可解释机器学习预测腹膜后脂肪肉瘤的生存：基于 SEER 数据库的研究和中国的外部验证。

Explainable machine learning predicts survival of retroperitoneal liposarcoma: A study based on the SEER database and external validation in China.

机构信息

Department of Urology, Shanghai Changhai Hospital, Naval Medical University, Shanghai, China.