Suppr超能文献

高维生存数据中存在竞争事件时的变量选择方法比较。

Comparison of variable selection methods for high-dimensional survival data with competing events.

机构信息

Department of Biostatistics, Institut Claudius Regaud, IUCT-O, Toulouse, France.

Department of Clinical Research and Investigation, Biostatistics and Methodology Unit, Institut Paoli-Calmettes, Aix Marseille University, INSERM, IRD, SESSTIM, Marseille, France.

出版信息

Comput Biol Med. 2017 Dec 1;91:159-167. doi: 10.1016/j.compbiomed.2017.10.021. Epub 2017 Oct 20.

Abstract

BACKGROUND

In the era of personalized medicine, it's primordial to identify gene signatures for each event type in the context of competing risks in order to improve risk stratification and treatment strategy. Until recently, little attention was paid to the performance of high-dimensional selection in deriving molecular signatures in this context. In this paper, we investigate the performance of two selection methods developed in the framework of high-dimensional data and competing risks: Random survival forest and a boosting approach for fitting proportional subdistribution hazards models.

METHODS

Using data from bladder cancer patients (GSE5479) and simulated datasets, stability and prognosis performance of the two methods were evaluated using a resampling strategy. For each sample, the data set was split into 100 training and validation sets. Molecular signatures were developed in the training sets by the two selection methods and then applied on the corresponding validation sets.

RESULTS

Random survival forest and boosting approach have comparable performance for the prediction of survival data, with few selected genes in common. Nevertheless, many different sets of genes are identified by the resampling approach, with a very small frequency of genes occurrence among the signatures. Also, the smaller the training sample size, the lower is the stability of the signatures.

CONCLUSION

Random survival forest and boosting approach give good predictive performance but gene signatures are very unstable. Further works are needed to propose adequate strategies for the analysis of high-dimensional data in the context of competing risks.

摘要

背景

在个性化医学时代,为了改善风险分层和治疗策略,有必要在竞争风险的背景下为每种事件类型确定基因特征。直到最近,在这种情况下,人们对高维选择在推导分子特征方面的性能关注甚少。在本文中,我们研究了两种在高维数据和竞争风险框架中开发的选择方法的性能:随机生存森林和用于拟合比例亚分布风险模型的boosting 方法。

方法

使用膀胱癌患者的数据(GSE5479)和模拟数据集,通过重采样策略评估这两种方法的稳定性和预后性能。对于每个样本,数据集分为 100 个训练集和验证集。通过两种选择方法在训练集中开发分子特征,然后将其应用于相应的验证集中。

结果

随机生存森林和 boosting 方法在预测生存数据方面具有相当的性能,共同选择的基因很少。然而,通过重采样方法识别出许多不同的基因集,并且在特征基因中出现的基因频率非常低。此外,训练样本量越小,特征的稳定性越低。

结论

随机生存森林和 boosting 方法具有良好的预测性能,但基因特征非常不稳定。需要进一步研究提出在竞争风险背景下分析高维数据的适当策略。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验