高维生存数据中存在竞争事件时的变量选择方法比较。

Comparison of variable selection methods for high-dimensional survival data with competing events.

机构信息

Department of Biostatistics, Institut Claudius Regaud, IUCT-O, Toulouse, France.

Department of Clinical Research and Investigation, Biostatistics and Methodology Unit, Institut Paoli-Calmettes, Aix Marseille University, INSERM, IRD, SESSTIM, Marseille, France.

出版信息

Comput Biol Med. 2017 Dec 1;91:159-167. doi: 10.1016/j.compbiomed.2017.10.021. Epub 2017 Oct 20.

DOI:10.1016/j.compbiomed.2017.10.021

PMID:29078093

Abstract

BACKGROUND

In the era of personalized medicine, it's primordial to identify gene signatures for each event type in the context of competing risks in order to improve risk stratification and treatment strategy. Until recently, little attention was paid to the performance of high-dimensional selection in deriving molecular signatures in this context. In this paper, we investigate the performance of two selection methods developed in the framework of high-dimensional data and competing risks: Random survival forest and a boosting approach for fitting proportional subdistribution hazards models.

METHODS

Using data from bladder cancer patients (GSE5479) and simulated datasets, stability and prognosis performance of the two methods were evaluated using a resampling strategy. For each sample, the data set was split into 100 training and validation sets. Molecular signatures were developed in the training sets by the two selection methods and then applied on the corresponding validation sets.

RESULTS

Random survival forest and boosting approach have comparable performance for the prediction of survival data, with few selected genes in common. Nevertheless, many different sets of genes are identified by the resampling approach, with a very small frequency of genes occurrence among the signatures. Also, the smaller the training sample size, the lower is the stability of the signatures.

CONCLUSION

Random survival forest and boosting approach give good predictive performance but gene signatures are very unstable. Further works are needed to propose adequate strategies for the analysis of high-dimensional data in the context of competing risks.

摘要

背景

在个性化医学时代，为了改善风险分层和治疗策略，有必要在竞争风险的背景下为每种事件类型确定基因特征。直到最近，在这种情况下，人们对高维选择在推导分子特征方面的性能关注甚少。在本文中，我们研究了两种在高维数据和竞争风险框架中开发的选择方法的性能：随机生存森林和用于拟合比例亚分布风险模型的boosting 方法。

方法

使用膀胱癌患者的数据（GSE5479）和模拟数据集，通过重采样策略评估这两种方法的稳定性和预后性能。对于每个样本，数据集分为 100 个训练集和验证集。通过两种选择方法在训练集中开发分子特征，然后将其应用于相应的验证集中。

结果

随机生存森林和 boosting 方法在预测生存数据方面具有相当的性能，共同选择的基因很少。然而，通过重采样方法识别出许多不同的基因集，并且在特征基因中出现的基因频率非常低。此外，训练样本量越小，特征的稳定性越低。

结论

随机生存森林和 boosting 方法具有良好的预测性能，但基因特征非常不稳定。需要进一步研究提出在竞争风险背景下分析高维数据的适当策略。

相似文献

Comparison of variable selection methods for high-dimensional survival data with competing events.高维生存数据中存在竞争事件时的变量选择方法比较。

Comput Biol Med. 2017 Dec 1;91:159-167. doi: 10.1016/j.compbiomed.2017.10.021. Epub 2017 Oct 20.

Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings.高维环境下生存数据分析中变量选择方法的比较。

Comput Math Methods Med. 2020 Jul 1;2020:6795392. doi: 10.1155/2020/6795392. eCollection 2020.

Competing risks data analysis with high-dimensional covariates: an application in bladder cancer.具有高维协变量的竞争风险数据分析：在膀胱癌中的应用

Genomics Proteomics Bioinformatics. 2015 Jun;13(3):169-76. doi: 10.1016/j.gpb.2015.04.001. Epub 2015 Apr 20.

Boosting for high-dimensional time-to-event data with competing risks.具有竞争风险的高维生存时间数据的增强方法

Bioinformatics. 2009 Apr 1;25(7):890-6. doi: 10.1093/bioinformatics/btp088. Epub 2009 Feb 25.

Tree-based models for survival data with competing risks.基于树的竞争风险生存数据分析模型。

Comput Methods Programs Biomed. 2018 Jun;159:185-198. doi: 10.1016/j.cmpb.2018.03.017. Epub 2018 Mar 21.

Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.通过优化一致性指数和稳定性选择提高稀疏生存模型的判别能力。

BMC Bioinformatics. 2016 Jul 22;17:288. doi: 10.1186/s12859-016-1149-8.

Regularized Weighted Nonparametric Likelihood Approach for High-Dimension Sparse Subdistribution Hazards Model for Competing Risk Data.正则化加权非参数似然法在高维稀疏亚分布风险模型中的应用。

Comput Math Methods Med. 2021 Sep 19;2021:5169052. doi: 10.1155/2021/5169052. eCollection 2021.

High-dimensional feature selection in competing risks modeling: A stable approach using a split-and-merge ensemble algorithm.竞争风险模型中的高维特征选择：一种使用分裂-合并集成算法的稳定方法。

Biom J. 2023 Feb;65(2):e2100164. doi: 10.1002/bimj.202100164. Epub 2022 Aug 7.

A random forest approach for competing risks based on pseudo-values.基于伪值的竞争风险随机森林方法。

Stat Med. 2013 Aug 15;32(18):3102-14. doi: 10.1002/sim.5775. Epub 2013 Mar 18.

HiFreSP: A novel high-frequency sub-pathway mining approach to identify robust prognostic gene signatures.HiFreSP：一种新颖的高频子路径挖掘方法，用于识别稳健的预后基因特征。

Brief Bioinform. 2020 Jul 15;21(4):1411-1424. doi: 10.1093/bib/bbz078.

引用本文的文献

Multicohort study testing the generalisability of the SASKit-ML stroke and PDAC prognostic model pipeline to other chronic diseases.多队列研究检验 SASKit-ML 中风和 PDAC 预后模型管道在其他慢性疾病中的泛化能力。

BMJ Open. 2024 Sep 30;14(9):e088181. doi: 10.1136/bmjopen-2024-088181.

Deep learning models for predicting the survival of patients with medulloblastoma based on a surveillance, epidemiology, and end results analysis.基于监测、流行病学和最终结果分析的预测髓母细胞瘤患者生存的深度学习模型。

Sci Rep. 2024 Jun 24;14(1):14490. doi: 10.1038/s41598-024-65367-9.

Re-evaluation of publicly available gene-expression databases using machine-learning yields a maximum prognostic power in breast cancer.使用机器学习对公开可用的基因表达数据库进行重新评估可在乳腺癌中产生最大的预后能力。

Sci Rep. 2023 Oct 5;13(1):16402. doi: 10.1038/s41598-023-41090-9.

Biom J. 2023 Feb;65(2):e2100164. doi: 10.1002/bimj.202100164. Epub 2022 Aug 7.

Comput Math Methods Med. 2021 Sep 19;2021:5169052. doi: 10.1155/2021/5169052. eCollection 2021.

Limitations of Explainability for Established Prognostic Biomarkers of Prostate Cancer.前列腺癌既定预后生物标志物的可解释性局限性

Front Genet. 2021 Jul 22;12:649429. doi: 10.3389/fgene.2021.649429. eCollection 2021.

Variable selection with Group LASSO approach: Application to Cox regression with frailty model.采用组套索方法进行变量选择：在含脆弱模型的Cox回归中的应用。

Commun Stat Simul Comput. 2021;50(3):881-901. doi: 10.1080/03610918.2019.1571605. Epub 2018 Feb 28.

Identification of a Hypoxia-Related Signature for Predicting Prognosis and the Immune Microenvironment in Bladder Cancer.用于预测膀胱癌预后和免疫微环境的缺氧相关特征的鉴定

Front Mol Biosci. 2021 May 7;8:613359. doi: 10.3389/fmolb.2021.613359. eCollection 2021.

Variable selection methods for predicting clinical outcomes following allogeneic hematopoietic cell transplantation.用于预测异基因造血细胞移植后临床结局的变量选择方法。

Sci Rep. 2021 Feb 5;11(1):3230. doi: 10.1038/s41598-021-82562-0.

Prognostic gene expression signatures of breast cancer are lacking a sensible biological meaning.乳腺癌预后基因表达特征缺乏合理的生物学意义。

Sci Rep. 2021 Jan 8;11(1):156. doi: 10.1038/s41598-020-79375-y.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

高维生存数据中存在竞争事件时的变量选择方法比较。

Comparison of variable selection methods for high-dimensional survival data with competing events.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献