Liu Fei
Department of Statistics, University of Missouri, Columbia, MO, USA.
Methods Mol Biol. 2010;620:538-46. doi: 10.1007/978-1-60761-580-4_20.
Many biomedical applications are concerned with the problem of selecting important predictors from a high-dimensional set of candidates, with the gene expression data as one example. Due to the fact that the sample size in any single study is usually small, it is thus important to combine information from multiple studies. In this chapter, we introduce a Bayesian hierarchical modeling approach which models study-to-study heterogeneity explicitly to borrow strength across studies. Using a carefully formulated prior specification, we develop a fast approach to predictor selection and shrinkage estimation for high-dimensional predictors. The proposed approach, which is related to the relevance vector machine (RVM), relies on maximum a posteriori (MAP) estimation to rapidly obtain a sparse estimate. As for the typical RVM, there is an intrinsic thresholding property in which unimportant predictors tend to have their coefficients shrunk to zero. The method will be illustrated with an application of selecting genes as predictors of time to an event.
许多生物医学应用都涉及从高维候选集中选择重要预测变量的问题,基因表达数据就是一个例子。由于任何单个研究中的样本量通常较小,因此整合多个研究的信息就显得很重要。在本章中,我们介绍一种贝叶斯分层建模方法,该方法明确地对研究间的异质性进行建模,以便在各研究之间借用优势。通过精心制定先验规范,我们开发了一种用于高维预测变量的预测变量选择和收缩估计的快速方法。所提出的方法与相关向量机(RVM)有关,它依靠最大后验(MAP)估计来快速获得稀疏估计。与典型的RVM一样,它具有一种内在的阈值化特性,即不重要的预测变量往往会使其系数收缩至零。该方法将通过一个选择基因作为事件发生时间预测变量的应用实例来说明。