Green Paul E, Park Taesung
Department of Epidemiology, University of Michigan, Ann Arbor, Michigan 48105, USA.
Biometrics. 2003 Dec;59(4):886-96. doi: 10.1111/j.0006-341x.2003.00103.x.
Log-linear models have been shown to be useful for smoothing contingency tables when categorical outcomes are subject to nonignorable nonresponse. A log-linear model can be fit to an augmented data table that includes an indicator variable designating whether subjects are respondents or nonrespondents. Maximum likelihood estimates calculated from the augmented data table are known to suffer from instability due to boundary solutions. Park and Brown (1994, Journal of the American Statistical Association 89, 44-52) and Park (1998, Biometrics 54, 1579-1590) developed empirical Bayes models that tend to smooth estimates away from the boundary. In those approaches, estimates for nonrespondents were calculated using an EM algorithm by maximizing a posterior distribution. As an extension of their earlier work, we develop a Bayesian hierarchical model that incorporates a log-linear model in the prior specification. In addition, due to uncertainty in the variable selection process associated with just one log-linear model, we simultaneously consider a finite number of models using a stochastic search variable selection (SSVS) procedure due to George and McCulloch (1997, Statistica Sinica 7, 339-373). The integration of the SSVS procedure into a Markov chain Monte Carlo (MCMC) sampler is straightforward, and leads to estimates of cell frequencies for the nonrespondents that are averages resulting from several log-linear models. The methods are demonstrated with a data example involving serum creatinine levels of patients who survived renal transplants. A simulation study is conducted to investigate properties of the model.
当分类结果存在不可忽略的无应答情况时,对数线性模型已被证明可用于平滑列联表。对数线性模型可以拟合到一个扩充数据表中,该表包含一个指示变量,用于指定受试者是应答者还是无应答者。已知从扩充数据表计算出的最大似然估计由于边界解而存在不稳定性。帕克和布朗(1994年,《美国统计协会杂志》89卷,第44 - 52页)以及帕克(1998年,《生物统计学》54卷,第1579 - 1590页)开发了经验贝叶斯模型,这些模型倾向于将估计值从边界处平滑开。在这些方法中,通过最大化后验分布,使用期望最大化(EM)算法计算无应答者的估计值。作为他们早期工作的扩展,我们开发了一种贝叶斯层次模型,该模型在先验规范中纳入了对数线性模型。此外,由于与单个对数线性模型相关的变量选择过程存在不确定性,我们使用乔治和麦卡洛赫(1997年,《统计学报》7卷,第339 - 373页)提出的随机搜索变量选择(SSVS)程序同时考虑有限数量的模型。将SSVS程序集成到马尔可夫链蒙特卡罗(MCMC)采样器中很直接,并且会得出无应答者的单元格频率估计值,这些估计值是几个对数线性模型的平均值。通过一个涉及肾移植存活患者血清肌酐水平的数据示例展示了这些方法。进行了一项模拟研究以调查该模型的性质。