Suppr超能文献

高维Cox模型:作为模型构建过程一部分的惩罚项选择

High-dimensional Cox models: the choice of penalty as part of the model building process.

作者信息

Benner Axel, Zucknick Manuela, Hielscher Thomas, Ittrich Carina, Mansmann Ulrich

机构信息

Division of Biostatistics, German Cancer Research Center, Heidelberg, Germany.

出版信息

Biom J. 2010 Feb;52(1):50-69. doi: 10.1002/bimj.200900064.

Abstract

The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high-dimensional models where the number of covariates is much larger than the number of observations (p>>n) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L(1)-penalized Cox regression using the lasso (Tibshirani (1997). Statistics in Medicine 16, 385-395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li (2001). Journal of the American Statistical Association 96, 1348-1360; Fan and Li (2002). The Annals of Statistics 30, 74-99). The purpose of this article is to implement them practically into the model building process when analyzing high-dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou (2006). Journal of the American Statistical Association 101, 1418-1429). We compare them with "standard" applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research.

摘要

Cox比例风险回归模型是用于对生存时间的协变量信息进行建模的最常用方法。在这种情况下,开发协变量数量远大于观测值数量(p>>n)的高维模型是一个持续存在的挑战。一种可行的方法是在这种情况下使用岭惩罚Cox回归。除了专注于找到最佳预测规则外,人们通常还对确定对预后最重要的协变量子集感兴趣。这在微阵列数据的生物统计分析中可能是一个基因集。例如,协变量选择可以通过使用套索的L(1)惩罚Cox回归来完成(Tibshirani(1997年)。《医学统计学》16,385 - 395)。近年来已经开发了几种超越套索的方法,这些方法纳入了协变量选择。这包括套索的修改以及非凸变体,如平滑截断绝对偏差(SCAD)(Fan和Li(2001年)。《美国统计协会杂志》96,1348 - 1360;Fan和Li(2002年)。《统计学年鉴》30,74 - 99)。本文的目的是在使用Cox比例风险模型分析高维数据时,将它们实际应用到模型构建过程中。为了评估超越套索的惩罚回归模型,我们纳入了SCAD变体和自适应套索(Zou(2006年)。《美国统计协会杂志》101,1418 - 1429)。我们将它们与“标准”应用进行比较,如岭回归、套索和弹性网络。将研究预测准确性、变量选择特征和估计偏差,以评估这些方法的实际应用。我们观察到SCAD和自适应套索的性能高度依赖于非平凡的预选择程序。目前还不存在解决这个问题的实际方案。由于在不适当的初始选择步骤之后使用SCAD或自适应套索时存在遗漏相关协变量的高风险,我们建议在实际数据应用中使用套索或弹性网络。但是考虑到对于真正稀疏模型的有前景的结果,如果有更好的预选择程序,我们看到了SCAD和自适应套索的一些优势。这需要进一步的方法学研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验