Kaplan David, Lee Chansoon
University of Wisconsin, Madison, WI, USA.
Eval Rev. 2018 Aug;42(4):423-457. doi: 10.1177/0193841X18761421. Epub 2018 Apr 11.
This article provides a review of Bayesian model averaging as a means of optimizing the predictive performance of common statistical models applied to large-scale educational assessments. The Bayesian framework recognizes that in addition to parameter uncertainty, there is uncertainty in the choice of models themselves. A Bayesian approach to addressing the problem of model uncertainty is the method of Bayesian model averaging. Bayesian model averaging searches the space of possible models for a set of submodels that satisfy certain scientific principles and then averages the coefficients across these submodels weighted by each model's posterior model probability (PMP). Using the weighted coefficients for prediction has been shown to yield optimal predictive performance according to certain scoring rules. We demonstrate the utility of Bayesian model averaging for prediction in education research with three examples: Bayesian regression analysis, Bayesian logistic regression, and a recently developed approach for Bayesian structural equation modeling. In each case, the model-averaged estimates are shown to yield better prediction of the outcome of interest than any submodel based on predictive coverage and the log-score rule. Implications for the design of large-scale assessments when the goal is optimal prediction in a policy context are discussed.
本文回顾了贝叶斯模型平均法,它是一种优化应用于大规模教育评估的常见统计模型预测性能的方法。贝叶斯框架认识到,除了参数不确定性外,模型本身的选择也存在不确定性。解决模型不确定性问题的贝叶斯方法是贝叶斯模型平均法。贝叶斯模型平均法在可能的模型空间中搜索一组满足特定科学原理的子模型,然后根据每个模型的后验模型概率(PMP)对这些子模型的系数进行加权平均。根据某些评分规则,使用加权系数进行预测已被证明能产生最优的预测性能。我们通过三个例子展示了贝叶斯模型平均法在教育研究预测中的效用:贝叶斯回归分析、贝叶斯逻辑回归以及最近开发的贝叶斯结构方程建模方法。在每种情况下,基于预测覆盖率和对数评分规则,模型平均估计值都比任何子模型对感兴趣的结果产生更好的预测。本文还讨论了在政策背景下以最优预测为目标时对大规模评估设计的影响。