可解释的机器学习可以优于 Cox 回归预测，并提供乳腺癌生存的见解。

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival.

机构信息

Department of Research and Development, Netherlands Comprehensive Cancer Organization (IKNL), Zernikestraat 29, 5612 HZ, Eindhoven, The Netherlands.

Department of Health Technology and Services Research, University of Twente, Enschede, The Netherlands.

出版信息

Sci Rep. 2021 Mar 26;11(1):6968. doi: 10.1038/s41598-021-86327-7.

DOI:10.1038/s41598-021-86327-7

PMID:33772109

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7998037/

Abstract

Cox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the [Formula: see text]-index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ([Formula: see text]-index [Formula: see text]), and in the case of XGB even better ([Formula: see text]-index [Formula: see text]). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models' predictions. We concluded that the difference in performance can be attributed to XGB's ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models' predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.

摘要

Cox 比例风险（CPH）分析是肿瘤学中生存分析的标准。最近，已经采用了几种机器学习（ML）技术来完成这项任务。尽管它们已经证明至少可以产生与经典方法一样好的结果，但由于缺乏透明度和几乎没有可解释性，它们通常被忽视，而这对于它们在临床环境中的采用至关重要。在本文中，我们使用来自荷兰癌症登记处的 36658 名非转移性乳腺癌患者的数据，比较了 CPH 与 ML 技术（随机生存森林、生存支持向量机和极端梯度增强[XGB]）在使用[Formula: see text]-指数预测生存方面的性能。我们证明，在我们的数据集，基于 ML 的模型至少可以与经典的 CPH 回归[Formula: see text]-指数[Formula: see text]一样好地执行，在 XGB 的情况下甚至更好[Formula: see text]-指数[Formula: see text]。此外，我们使用 Shapley Additive Explanation（SHAP）值来解释模型的预测。我们得出结论，性能差异可以归因于 XGB 建模非线性和复杂相互作用的能力。我们还研究了特定特征对模型预测的影响及其相应的见解。最后，我们表明，可解释的 ML 可以生成关于模型如何进行预测的明确知识，这对于增加创新的 ML 技术在肿瘤学和整个医疗保健中的信任和采用至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/263f/7998037/5746be1f426a/41598_2021_86327_Fig1_HTML.jpg

相似文献

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival.可解释的机器学习可以优于 Cox 回归预测，并提供乳腺癌生存的见解。

Sci Rep. 2021 Mar 26;11(1):6968. doi: 10.1038/s41598-021-86327-7.

Combining machine learning with Cox models to identify predictors for incident post-menopausal breast cancer in the UK Biobank.将机器学习与 Cox 模型相结合，以确定英国生物库中绝经后乳腺癌发病的预测因子。

Sci Rep. 2023 Jun 7;13(1):9221. doi: 10.1038/s41598-023-36214-0.

Evaluation of risk factors and survival rates of patients with early-stage breast cancer with machine learning and traditional methods.基于机器学习和传统方法评估早期乳腺癌患者的风险因素和生存率。

Int J Med Inform. 2024 Oct;190:105548. doi: 10.1016/j.ijmedinf.2024.105548. Epub 2024 Jul 11.

Application of machine learning techniques for predicting survival in ovarian cancer.机器学习技术在卵巢癌生存预测中的应用。

BMC Med Inform Decis Mak. 2022 Dec 30;22(1):345. doi: 10.1186/s12911-022-02087-y.

Machine Learning Explainability in Breast Cancer Survival.乳腺癌生存中的机器学习可解释性

Stud Health Technol Inform. 2020 Jun 16;270:307-311. doi: 10.3233/SHTI200172.

Dementia risk prediction in individuals with mild cognitive impairment: a comparison of Cox regression and machine learning models.轻度认知障碍个体的痴呆风险预测：Cox 回归和机器学习模型的比较。

BMC Med Res Methodol. 2022 Nov 2;22(1):284. doi: 10.1186/s12874-022-01754-y.

Machine Learning Methods for Survival Analysis with Clinical and Transcriptomics Data of Breast Cancer.机器学习方法在乳腺癌临床和转录组学数据中的生存分析。

Methods Mol Biol. 2023;2553:325-393. doi: 10.1007/978-1-0716-2617-7_16.

Explainable deep learning-based survival prediction for non-small cell lung cancer patients undergoing radical radiotherapy.基于可解释深度学习的非小细胞肺癌根治性放疗患者生存预测

Radiother Oncol. 2024 Apr;193:110084. doi: 10.1016/j.radonc.2024.110084. Epub 2024 Jan 18.

Can Machine-learning Algorithms Predict Early Revision TKA in the Danish Knee Arthroplasty Registry?机器学习算法能否预测丹麦膝关节置换登记处的早期翻修 TKA？

Clin Orthop Relat Res. 2020 Sep;478(9):2088-2101. doi: 10.1097/CORR.0000000000001343.

Development of prediction models for one-year brain tumour survival using machine learning: a comparison of accuracy and interpretability.使用机器学习开发脑肿瘤一年生存率预测模型：准确性与可解释性的比较

Comput Methods Programs Biomed. 2023 May;233:107482. doi: 10.1016/j.cmpb.2023.107482. Epub 2023 Mar 13.

引用本文的文献

Predicting mortality dynamics in cancer patients: A machine learning approach to pre-death events.预测癌症患者的死亡动态：一种针对死前事件的机器学习方法。

PLoS One. 2025 Sep 9;20(9):e0331650. doi: 10.1371/journal.pone.0331650. eCollection 2025.

An Introduction to Machine Learning for Speech-Language Pathologists: Concepts, Terminology, and Emerging Applications.面向言语语言病理学家的机器学习导论：概念、术语及新兴应用

Perspect ASHA Spec Interest Groups. 2025 Apr;10(2):432-450. doi: 10.1044/2024_persp-24-00037. Epub 2025 Apr 1.

"intelligent Read Across (iRA)"- A tool for read-across-based toxicity prediction of nanoparticles.“智能跨读（iRA）”——一种基于跨读的纳米颗粒毒性预测工具。

Comput Struct Biotechnol J. 2025 Jul 17;29:186-200. doi: 10.1016/j.csbj.2025.07.032. eCollection 2025.

Integrating CEUS Imaging Features and LI-RADS Classification for Postoperative Early Recurrence Prediction in Solitary Hepatocellular Carcinoma: A Machine Learning-Based Prognostic Approach.整合对比增强超声成像特征与LI-RADS分类用于预测孤立性肝细胞癌术后早期复发：一种基于机器学习的预后方法

J Hepatocell Carcinoma. 2025 Jul 3;12:1287-1300. doi: 10.2147/JHC.S530848. eCollection 2025.

Development and validation of hybrid machine learning approach for predicting survival in patients with cervical cancer: a SEER-based population study.用于预测宫颈癌患者生存率的混合机器学习方法的开发与验证：一项基于监测、流行病学和最终结果（SEER）的人群研究

Front Oncol. 2025 Jun 18;15:1605378. doi: 10.3389/fonc.2025.1605378. eCollection 2025.

AI-driven analysis by identifying risk factors of VL relapse in HIV co-infected patients.通过识别合并感染HIV患者中VL复发的危险因素进行人工智能驱动的分析。

Sci Rep. 2025 Jul 1;15(1):21067. doi: 10.1038/s41598-025-07406-7.

Multimorbidity Patterns and Depression: Bridging Epidemiological Associations with Predictive Analytics for Risk Stratification.多重疾病模式与抑郁症：将流行病学关联与风险分层的预测分析相联系。

Healthcare (Basel). 2025 Jun 18;13(12):1458. doi: 10.3390/healthcare13121458.

Weaning performance prediction in lactating sows using machine learning, for precision nutrition and intelligent feeding.利用机器学习预测哺乳母猪的断奶性能，以实现精准营养和智能饲喂。

Anim Nutr. 2025 Apr 1;21:222-233. doi: 10.1016/j.aninu.2025.01.007. eCollection 2025 Jun.

A machine learning approach for multimodal data fusion for survival prediction in cancer patients.一种用于癌症患者生存预测的多模态数据融合的机器学习方法。

NPJ Precis Oncol. 2025 May 6;9(1):128. doi: 10.1038/s41698-025-00917-6.

Association Between Risk Factors and Major Cancers: Explainable Machine Learning Approach.风险因素与主要癌症之间的关联：可解释机器学习方法

JMIR Cancer. 2025 May 2;11:e62833. doi: 10.2196/62833.

本文引用的文献

Tree-Based Machine Learning to Identify and Understand Major Determinants for Stroke at the Neighborhood Level.基于树的机器学习方法在社区层面识别和理解中风主要决定因素

J Am Heart Assoc. 2020 Nov 17;9(22):e016745. doi: 10.1161/JAHA.120.016745. Epub 2020 Nov 3.

A robust algorithm for explaining unreliable machine learning survival models using the Kolmogorov-Smirnov bounds.一种使用柯尔莫哥洛夫-斯米尔诺夫界解释不可靠机器学习生存模型的稳健算法。

Neural Netw. 2020 Dec;132:1-18. doi: 10.1016/j.neunet.2020.08.007. Epub 2020 Aug 18.

Machine Learning-Based Interpretation and Visualization of Nonlinear Interactions in Prostate Cancer Survival.基于机器学习的前列腺癌生存中非线性交互的解释与可视化。

JCO Clin Cancer Inform. 2020 Jul;4:637-646. doi: 10.1200/CCI.20.00002.

From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解

Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.

Machine Learning Explainability in Breast Cancer Survival.乳腺癌生存中的机器学习可解释性

Stud Health Technol Inform. 2020 Jun 16;270:307-311. doi: 10.3233/SHTI200172.

Machine Learning in oncology: A clinical appraisal.机器学习在肿瘤学中的应用：临床评价。

Cancer Lett. 2020 Jul 1;481:55-62. doi: 10.1016/j.canlet.2020.03.032. Epub 2020 Apr 3.

Machine Learning and Mechanistic Modeling for Prediction of Metastatic Relapse in Early-Stage Breast Cancer.用于预测早期乳腺癌转移复发的机器学习与机制建模

JCO Clin Cancer Inform. 2020 Mar;4:259-274. doi: 10.1200/CCI.19.00133.

The Application of Deep Learning in Cancer Prognosis Prediction.深度学习在癌症预后预测中的应用。

Cancers (Basel). 2020 Mar 5;12(3):603. doi: 10.3390/cancers12030603.

Cancer Diagnosis Using Deep Learning: A Bibliographic Review.使用深度学习进行癌症诊断：文献综述

Cancers (Basel). 2019 Aug 23;11(9):1235. doi: 10.3390/cancers11091235.

Deep learning-based survival prediction of oral cancer patients.基于深度学习的口腔癌患者生存预测。

Sci Rep. 2019 May 6;9(1):6994. doi: 10.1038/s41598-019-43372-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

可解释的机器学习可以优于 Cox 回归预测，并提供乳腺癌生存的见解。

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献