Rossell David, Rubio Francisco J
Universitat Pompeu Fabra, Department of Business and Economics, Barcelona (Spain).
London School of Hygiene & Tropical Medicine, London (United Kingdom).
J Am Stat Assoc. 2018;113(524):1742-1758. doi: 10.1080/01621459.2017.1371025. Epub 2018 Jun 28.
Bayesian variable selection often assumes normality, but the effects of model misspecification are not sufficiently understood. There are sound reasons behind this assumption, particularly for large : ease of interpretation, analytical and computational convenience. More flexible frameworks exist, including semi- or non-parametric models, often at the cost of some tractability. We propose a simple extension that allows for skewness and thicker-than-normal tails but preserves tractability. It leads to easy interpretation and a log-concave likelihood that facilitates optimization and integration. We characterize asymptotically parameter estimation and Bayes factor rates, under certain model misspecification. Under suitable conditions misspecified Bayes factors induce sparsity at the same rates than under the correct model. However, the rates to detect signal change by an exponential factor, often reducing sensitivity. These deficiencies can be ameliorated by inferring the error distribution, a simple strategy that can improve inference substantially. Our work focuses on the likelihood and can be combined with any likelihood penalty or prior, but here we focus on non-local priors to induce extra sparsity and ameliorate finite-sample effects caused by misspecification. We show the importance of considering the likelihood rather than solely the prior, for Bayesian variable selection. The methodology is in R package 'mombf'.
贝叶斯变量选择通常假定数据服从正态分布,但模型误设的影响尚未得到充分理解。这一假设背后有合理的原因,尤其是对于大数据集而言:易于解释、分析和计算方便。也存在更灵活的框架,包括半参数或非参数模型,但往往要以牺牲一定的易处理性为代价。我们提出了一种简单的扩展方法,它允许数据具有偏态和比正态分布更厚的尾部,同时保持易处理性。它易于解释,并且对数似然函数是凹函数,便于进行优化和积分。在某些模型误设的情况下,我们刻画了渐近参数估计和贝叶斯因子率。在合适的条件下,误设的贝叶斯因子与正确模型下相比,以相同的速率诱导稀疏性。然而,通过指数因子检测信号变化的速率,往往会降低灵敏度。通过推断误差分布可以改善这些不足,这是一种能显著改进推断的简单策略。我们的工作聚焦于似然函数,可以与任何似然惩罚或先验相结合,但这里我们聚焦于非局部先验,以诱导额外的稀疏性并改善由误设引起的有限样本效应。我们展示了在贝叶斯变量选择中考虑似然函数而非仅仅考虑先验的重要性。该方法在R包“mombf”中。