Liu Fei, Dunson David, Zou Fei
IBM T. J. Watson Research Center, Yorktown Heights, New York 10598, USA.
Biometrics. 2011 Jun;67(2):504-12. doi: 10.1111/j.1541-0420.2010.01466.x. Epub 2010 Aug 5.
This article considers the problem of selecting predictors of time to an event from a high-dimensional set of candidate predictors using data from multiple studies. As an alternative to the current multistage testing approaches, we propose to model the study-to-study heterogeneity explicitly using a hierarchical model to borrow strength. Our method incorporates censored data through an accelerated failure time model. Using a carefully formulated prior specification, we develop a fast approach to predictor selection and shrinkage estimation for high-dimensional predictors. For model fitting, we develop a Monte Carlo expectation maximization (MC-EM) algorithm to accommodate censored data. The proposed approach, which is related to the relevance vector machine (RVM), relies on maximum a posteriori estimation to rapidly obtain a sparse estimate. As for the typical RVM, there is an intrinsic thresholding property in which unimportant predictors tend to have their coefficients shrunk to zero. We compare our method with some commonly used procedures through simulation studies. We also illustrate the method using the gene expression barcode data from three breast cancer studies.
本文探讨了如何利用来自多项研究的数据,从高维候选预测变量集中选择事件发生时间的预测变量这一问题。作为当前多阶段测试方法的替代方案,我们建议使用分层模型明确地对研究间的异质性进行建模,以借鉴优势。我们的方法通过加速失效时间模型纳入删失数据。利用精心制定的先验规范,我们开发了一种针对高维预测变量进行预测变量选择和收缩估计的快速方法。对于模型拟合,我们开发了一种蒙特卡罗期望最大化(MC - EM)算法来处理删失数据。所提出的方法与相关向量机(RVM)有关,它依赖于最大后验估计来快速获得稀疏估计。与典型的RVM一样,存在一种内在的阈值化特性,即不重要的预测变量往往会使其系数收缩至零。我们通过模拟研究将我们的方法与一些常用程序进行比较。我们还使用来自三项乳腺癌研究的基因表达条形码数据说明了该方法。