Burgette Lane F, Reiter Jerome P
RAND Corporation.
Duke University, Department of Statistical Science.
Bayesian Anal. 2013 Jun 1;8(2). doi: 10.1214/13-BA816.
Multinomial outcomes with many levels can be challenging to model. Information typically accrues slowly with increasing sample size, yet the parameter space expands rapidly with additional covariates. Shrinking all regression parameters towards zero, as often done in models of continuous or binary response variables, is unsatisfactory, since setting parameters equal to zero in multinomial models does not necessarily imply "no effect." We propose an approach to modeling multinomial outcomes with many levels based on a Bayesian multinomial probit (MNP) model and a multiple shrinkage prior distribution for the regression parameters. The prior distribution encourages the MNP regression parameters to shrink toward a number of learned locations, thereby substantially reducing the dimension of the parameter space. Using simulated data, we compare the predictive performance of this model against two other recently-proposed methods for big multinomial models. The results suggest that the fully Bayesian, multiple shrinkage approach can outperform these other methods. We apply the multiple shrinkage MNP to simulating replacement values for areal identifiers, e.g., census tract indicators, in order to protect data confidentiality in public use datasets.
具有多个水平的多项结果建模可能具有挑战性。随着样本量的增加,信息通常积累缓慢,但参数空间会随着额外的协变量迅速扩展。像在连续或二元响应变量模型中经常做的那样,将所有回归参数向零收缩并不令人满意,因为在多项模型中设置参数等于零并不一定意味着“无效应”。我们提出一种基于贝叶斯多项概率单位(MNP)模型和回归参数的多重收缩先验分布来对具有多个水平的多项结果进行建模的方法。先验分布鼓励MNP回归参数向多个学习到的位置收缩,从而大幅降低参数空间的维度。使用模拟数据,我们将该模型的预测性能与最近提出的另外两种用于大型多项模型的方法进行比较。结果表明,完全贝叶斯多重收缩方法可以优于其他方法。我们将多重收缩MNP应用于模拟区域标识符(例如人口普查区指标)的替换值,以保护公共使用数据集中的数据机密性。