Mittal Sushil, Madigan David, Cheng Jerry Q, Burd Randall S
Department of Statistics, Columbia University, 1255 Amsterdam Avenue, New York, NY 10027, USA.
Stat Med. 2013 Oct 15;32(23):3955-71. doi: 10.1002/sim.5817. Epub 2013 Apr 28.
Survival analysis has been a topic of active statistical research in the past few decades with applications spread across several areas. Traditional applications usually consider data with only a small numbers of predictors with a few hundreds or thousands of observations. Recent advances in data acquisition techniques and computation power have led to considerable interest in analyzing very-high-dimensional data where the number of predictor variables and the number of observations range between 10(4) and 10(6). In this paper, we present a tool for performing large-scale regularized parametric survival analysis using a variant of the cyclic coordinate descent method. Through our experiments on two real data sets, we show that application of regularized models to high-dimensional data avoids overfitting and can provide improved predictive performance and calibration over corresponding low-dimensional models.
在过去几十年中,生存分析一直是活跃的统计研究主题,其应用广泛涉及多个领域。传统应用通常考虑只有少量预测变量且观测值只有几百或几千个的数据。数据采集技术和计算能力的最新进展引发了人们对分析超高维数据的浓厚兴趣,其中预测变量的数量和观测值的数量在10⁴到10⁶之间。在本文中,我们提出了一种使用循环坐标下降法的变体来执行大规模正则化参数生存分析的工具。通过我们在两个真实数据集上的实验,我们表明将正则化模型应用于高维数据可避免过拟合,并且与相应的低维模型相比,能提供更好的预测性能和校准。