Suppr超能文献

高维生存数据中存在竞争事件时的变量选择方法比较。

Comparison of variable selection methods for high-dimensional survival data with competing events.

机构信息

Department of Biostatistics, Institut Claudius Regaud, IUCT-O, Toulouse, France.

Department of Clinical Research and Investigation, Biostatistics and Methodology Unit, Institut Paoli-Calmettes, Aix Marseille University, INSERM, IRD, SESSTIM, Marseille, France.

出版信息

Comput Biol Med. 2017 Dec 1;91:159-167. doi: 10.1016/j.compbiomed.2017.10.021. Epub 2017 Oct 20.

Abstract

BACKGROUND

In the era of personalized medicine, it's primordial to identify gene signatures for each event type in the context of competing risks in order to improve risk stratification and treatment strategy. Until recently, little attention was paid to the performance of high-dimensional selection in deriving molecular signatures in this context. In this paper, we investigate the performance of two selection methods developed in the framework of high-dimensional data and competing risks: Random survival forest and a boosting approach for fitting proportional subdistribution hazards models.

METHODS

Using data from bladder cancer patients (GSE5479) and simulated datasets, stability and prognosis performance of the two methods were evaluated using a resampling strategy. For each sample, the data set was split into 100 training and validation sets. Molecular signatures were developed in the training sets by the two selection methods and then applied on the corresponding validation sets.

RESULTS

Random survival forest and boosting approach have comparable performance for the prediction of survival data, with few selected genes in common. Nevertheless, many different sets of genes are identified by the resampling approach, with a very small frequency of genes occurrence among the signatures. Also, the smaller the training sample size, the lower is the stability of the signatures.

CONCLUSION

Random survival forest and boosting approach give good predictive performance but gene signatures are very unstable. Further works are needed to propose adequate strategies for the analysis of high-dimensional data in the context of competing risks.

摘要

背景

在个性化医学时代,为了改善风险分层和治疗策略,有必要在竞争风险的背景下为每种事件类型确定基因特征。直到最近,在这种情况下,人们对高维选择在推导分子特征方面的性能关注甚少。在本文中,我们研究了两种在高维数据和竞争风险框架中开发的选择方法的性能:随机生存森林和用于拟合比例亚分布风险模型的boosting 方法。

方法

使用膀胱癌患者的数据(GSE5479)和模拟数据集,通过重采样策略评估这两种方法的稳定性和预后性能。对于每个样本,数据集分为 100 个训练集和验证集。通过两种选择方法在训练集中开发分子特征,然后将其应用于相应的验证集中。

结果

随机生存森林和 boosting 方法在预测生存数据方面具有相当的性能,共同选择的基因很少。然而,通过重采样方法识别出许多不同的基因集,并且在特征基因中出现的基因频率非常低。此外,训练样本量越小,特征的稳定性越低。

结论

随机生存森林和 boosting 方法具有良好的预测性能,但基因特征非常不稳定。需要进一步研究提出在竞争风险背景下分析高维数据的适当策略。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验