Johnson Brent A, Long Qi, Chung Matthias
Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30322, USA.
Biometrics. 2011 Dec;67(4):1379-88. doi: 10.1111/j.1541-0420.2011.01587.x. Epub 2011 Apr 2.
Dimension reduction, model and variable selection are ubiquitous concepts in modern statistical science and deriving new methods beyond the scope of current methodology is noteworthy. This article briefly reviews existing regularization methods for penalized least squares and likelihood for survival data and their extension to a certain class of penalized estimating function. We show that if one's goal is to estimate the entire regularized coefficient path using the observed survival data, then all current strategies fail for the Buckley-James estimating function. We propose a novel two-stage method to estimate and restore the entire Dantzig-regularized coefficient path for censored outcomes in a least-squares framework. We apply our methods to a microarray study of lung andenocarcinoma with sample size n = 200 and p = 1036 gene predictors and find 10 genes that are consistently selected across different criteria and an additional 14 genes that merit further investigation. In simulation studies, we found that the proposed path restoration and variable selection technique has the potential to perform as well as existing methods that begin with a proper convex loss function at the outset.
降维、模型和变量选择是现代统计科学中普遍存在的概念,开发超出当前方法范围的新方法值得关注。本文简要回顾了用于惩罚最小二乘法和生存数据似然性的现有正则化方法,以及它们对某类惩罚估计函数的扩展。我们表明,如果目标是使用观察到的生存数据估计整个正则化系数路径,那么对于Buckley-James估计函数,所有当前策略都将失败。我们提出了一种新颖的两阶段方法,用于在最小二乘框架中估计和恢复删失结果的整个Dantzig正则化系数路径。我们将我们的方法应用于一项样本量为n = 200且有p = 1036个基因预测变量的肺腺癌微阵列研究,发现有10个基因在不同标准下被一致选择,另有14个基因值得进一步研究。在模拟研究中,我们发现所提出的路径恢复和变量选择技术有可能与一开始就使用适当凸损失函数的现有方法表现得一样好。