Boonstra Philip S, Mukherjee Bhramar, Taylor Jeremy Mg
University of Michigan.
Ann Appl Stat. 2013 Dec 1;7(4):2272-2292. doi: 10.1214/13-AOAS668.
Motivated by the increasing use of and rapid changes in array technologies, we consider the prediction problem of fitting a linear regression relating a continuous outcome to a large number of covariates , eg measurements from current, state-of-the-art technology. For most of the samples, only the outcome and surrogate covariates, , are available. These surrogates may be data from prior studies using older technologies. Owing to the dimension of the problem and the large fraction of missing information, a critical issue is appropriate shrinkage of model parameters for an optimal bias-variance tradeoff. We discuss a variety of fully Bayesian and Empirical Bayes algorithms which account for uncertainty in the missing data and adaptively shrink parameter estimates for superior prediction. These methods are evaluated via a comprehensive simulation study. In addition, we apply our methods to a lung cancer dataset, predicting survival time () using qRT-PCR ( ) and microarray ( ) measurements.
受阵列技术使用的增加和快速变化的推动,我们考虑拟合一个将连续结果与大量协变量(例如来自当前最先进技术的测量值)相关联的线性回归的预测问题。对于大多数样本,仅可获得结果以及替代协变量。这些替代变量可能是来自使用旧技术的先前研究的数据。由于问题的维度和大量缺失信息,一个关键问题是为了实现最佳偏差 - 方差权衡而对模型参数进行适当的收缩。我们讨论了各种完全贝叶斯和经验贝叶斯算法,这些算法考虑了缺失数据中的不确定性,并自适应地收缩参数估计以实现更好的预测。通过全面的模拟研究对这些方法进行了评估。此外,我们将我们的方法应用于一个肺癌数据集,使用定量逆转录聚合酶链反应(qRT-PCR)和微阵列测量来预测生存时间()。