Meijer Rosa J, Goeman Jelle J
Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Postzone S5-P, P.O. Box 9604, 2300 RC Leiden, The Netherlands.
Biom J. 2013 Mar;55(2):141-55. doi: 10.1002/bimj.201200088. Epub 2013 Jan 24.
In model building and model evaluation, cross-validation is a frequently used resampling method. Unfortunately, this method can be quite time consuming. In this article, we discuss an approximation method that is much faster and can be used in generalized linear models and Cox' proportional hazards model with a ridge penalty term. Our approximation method is based on a Taylor expansion around the estimate of the full model. In this way, all cross-validated estimates are approximated without refitting the model. The tuning parameter can now be chosen based on these approximations and can be optimized in less time. The method is most accurate when approximating leave-one-out cross-validation results for large data sets which is originally the most computationally demanding situation. In order to demonstrate the method's performance, it will be applied to several microarray data sets. An R package penalized, which implements the method, is available on CRAN.
在模型构建和模型评估中,交叉验证是一种常用的重采样方法。不幸的是,这种方法可能相当耗时。在本文中,我们讨论了一种近似方法,它速度更快,可用于具有岭惩罚项的广义线性模型和考克斯比例风险模型。我们的近似方法基于围绕完整模型估计值的泰勒展开。通过这种方式,无需重新拟合模型即可近似所有交叉验证估计值。现在可以基于这些近似值选择调整参数,并能在更短时间内进行优化。当近似大数据集的留一法交叉验证结果时,该方法最为准确,而留一法交叉验证原本是计算量最大的情况。为了证明该方法的性能,将其应用于几个微阵列数据集。实现该方法的R包“penalized”可在CRAN上获取。