Department of Biostatistics, Erasmus MC Rotterdam, The Netherlands.
Stat Med. 2012 May 20;31(11-12):1221-37. doi: 10.1002/sim.4439. Epub 2012 Jan 25.
The objective of finding a parsimonious representation of the observed data by a statistical model that is also capable of accurate prediction is commonplace in all domains of statistical applications. The parsimony of the solutions obtained by variable selection is usually counterbalanced by a limited prediction capacity. On the other hand, methodologies that assure high prediction accuracy usually lead to models that are neither simple nor easily interpretable. Regularization methodologies have proven to be useful in addressing both prediction and variable selection problems. The Bayesian approach to regularization constitutes a particularly attractive alternative as it is suitable for high-dimensional modeling, offers valid standard errors, and enables simultaneous estimation of regression coefficients and complexity parameters via computationally efficient MCMC techniques. Bayesian regularization falls within the versatile framework of Bayesian hierarchical models, which encompasses a variety of other approaches suited for variable selection such as spike and slab models and the MC(3) approach. In this article, we review these Bayesian developments and evaluate their variable selection performance in a simulation study for the classical small p large n setting. The majority of the existing Bayesian methodology for variable selection deals only with classical linear regression. Here, we present two applications in the contexts of binary and survival regression, where the Bayesian approach was applied to select markers prognostically relevant for the development of rheumatoid arthritis and for overall survival in acute myeloid leukemia patients.
通过统计模型对观测数据进行简洁表示,同时又能进行准确预测,这是所有统计应用领域的常见目标。通过变量选择获得的解的简洁性通常会被预测能力的限制所平衡。另一方面,确保高预测准确性的方法通常会导致模型既不简单也不容易解释。正则化方法已被证明在解决预测和变量选择问题方面非常有效。贝叶斯正则化方法是一种特别有吸引力的选择,因为它适用于高维建模,提供有效的标准误差,并通过计算效率高的 MCMC 技术实现回归系数和复杂度参数的同时估计。贝叶斯正则化属于贝叶斯层次模型的通用框架,其中包含了各种适用于变量选择的其他方法,例如尖峰和板模型和 MC(3)方法。在本文中,我们回顾了这些贝叶斯发展,并在经典小 p 大 n 环境下的模拟研究中评估了它们的变量选择性能。现有的大多数用于变量选择的贝叶斯方法仅处理经典线性回归。在这里,我们在二元和生存回归的背景下介绍了两个应用程序,其中贝叶斯方法被应用于选择与类风湿关节炎发展和急性髓细胞白血病患者总体生存相关的预后标志物。