Suppr超能文献

乳腺癌生存分析:评估用于预测的集成学习技术

Survival analysis in breast cancer: evaluating ensemble learning techniques for prediction.

作者信息

Buyrukoğlu Gonca

机构信息

Department of Statistics/ Faculty of Science, Çankırı Karatekin University, Çankırı, Turkey.

出版信息

PeerJ Comput Sci. 2024 Jul 10;10:e2147. doi: 10.7717/peerj-cs.2147. eCollection 2024.

Abstract

Breast cancer is most commonly faced with form of cancer amongst women worldwide. In spite of the fact that the breast cancer research and awareness have gained considerable momentum, there is still no one treatment due to disease heterogeneity. Survival data may be of specific interest in breast cancer studies to understand its dynamic and complex trajectories. This study copes with the most important covariates affecting the disease progression. The study utilizes the German Breast Cancer Study Group 2 (GBSG2) and the Molecular Taxonomy of Breast Cancer International Consortium dataset (METABRIC) datasets. In both datasets, interests lie in relapse of the disease and the time when the relapse happens. The three models, namely the Cox proportional hazards (PH) model, random survival forest (RSF) and conditional inference forest (Cforest) were employed to analyse the breast cancer datasets. The goal of this study is to apply these methods in prediction of breast cancer progression and compare their performances based on two different estimation methods: the bootstrap estimation and the bootstrap .632 estimation. The model performance was evaluated in concordance index (C-index) and prediction error curves (pec) for discrimination. The Cox PH model has a lower C-index and bigger prediction error compared to the RSF and the Cforest approach for both datasets. The analysis results of GBSG2 and METABRIC datasets reveal that the RSF and the Cforest algorithms provide non-parametric alternatives to Cox PH model for estimation of the survival probability of breast cancer patients.

摘要

乳腺癌是全球女性中最常见的癌症形式。尽管乳腺癌研究和认知已取得显著进展,但由于疾病的异质性,仍然没有一种统一的治疗方法。在乳腺癌研究中,生存数据对于理解其动态和复杂的发展轨迹可能具有特殊意义。本研究探讨了影响疾病进展的最重要协变量。该研究使用了德国乳腺癌研究组2(GBSG2)和国际乳腺癌分子分类联盟数据集(METABRIC)。在这两个数据集中,关注点在于疾病的复发以及复发发生的时间。采用了三种模型,即Cox比例风险(PH)模型、随机生存森林(RSF)和条件推断森林(Cforest)来分析乳腺癌数据集。本研究的目的是将这些方法应用于预测乳腺癌进展,并基于两种不同的估计方法:自助估计和自助.632估计,比较它们的性能。通过一致性指数(C-index)和预测误差曲线(pec)来评估模型的判别性能。对于两个数据集,与RSF和Cforest方法相比,Cox PH模型的C-index较低且预测误差较大。GBSG2和METABRIC数据集的分析结果表明,RSF和Cforest算法为估计乳腺癌患者的生存概率提供了Cox PH模型的非参数替代方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef71/11323082/8086d575414f/peerj-cs-10-2147-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验