Song Rui, Lu Wenbin, Ma Shuangge, Jeng X Jessie
Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695, USA.
Division of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut 06510, USA.
Biometrika. 2014;101(4):799-814. doi: 10.1093/biomet/asu047.
In modern statistical applications, the dimension of covariates can be much larger than the sample size. In the context of linear models, correlation screening (Fan and Lv, 2008) has been shown to reduce the dimension of such data effectively while achieving the sure screening property, i.e., all of the active variables can be retained with high probability. However, screening based on the Pearson correlation does not perform well when applied to contaminated covariates and/or censored outcomes. In this paper, we study censored rank independence screening of high-dimensional survival data. The proposed method is robust to predictors that contain outliers, works for a general class of survival models, and enjoys the sure screening property. Simulations and an analysis of real data demonstrate that the proposed method performs competitively on survival data sets of moderate size and high-dimensional predictors, even when these are contaminated.
在现代统计应用中,协变量的维度可能比样本量要大得多。在线性模型的背景下,相关筛选(范剑青和吕晓玲,2008)已被证明能有效降低此类数据的维度,同时实现确定筛选性质,即所有活跃变量都能以高概率被保留。然而,基于皮尔逊相关的筛选应用于受污染的协变量和/或删失结局时效果不佳。在本文中,我们研究高维生存数据的删失秩独立性筛选。所提出的方法对包含异常值的预测变量具有稳健性,适用于一般类别的生存模型,并且具有确定筛选性质。模拟和实际数据分析表明,所提出的方法在中等规模和高维预测变量的生存数据集上表现出色,即使这些数据受到污染。