Ibrahim Joseph G, Zhu Hongtu, Tang Niansheng
Joseph G. Ibrahim is Alumni Distinguished Professor (E-mail:
J Am Stat Assoc. 2008 Dec 1;103(484):1648-1658. doi: 10.1198/016214508000001057.
We consider novel methods for the computation of model selection criteria in missing-data problems based on the output of the EM algorithm. The methodology is very general and can be applied to numerous situations involving incomplete data within an EM framework, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Toward this goal, we develop a class of information criteria for missing-data problems, called IC(H) (,) (Q), which yields the Akaike information criterion and the Bayesian information criterion as special cases. The computation of IC(H) (,) (Q) requires an analytic approximation to a complicated function, called the H-function, along with output from the EM algorithm used in obtaining maximum likelihood estimates. The approximation to the H-function leads to a large class of information criteria, called IC(H̃) (() (k) (),) (Q). Theoretical properties of IC(H̃) (() (k) (),) (Q), including consistency, are investigated in detail. To eliminate the analytic approximation to the H-function, a computationally simpler approximation to IC(H) (,) (Q), called IC(Q), is proposed, the computation of which depends solely on the Q-function of the EM algorithm. Advantages and disadvantages of IC(H̃) (() (k) (),) (Q) and IC(Q) are discussed and examined in detail in the context of missing-data problems. Extensive simulations are given to demonstrate the methodology and examine the small-sample and large-sample performance of IC(H̃) (() (k) (),) (Q) and IC(Q) in missing-data problems. An AIDS data set also is presented to illustrate the proposed methodology.
我们考虑基于期望最大化(EM)算法的输出,针对缺失数据问题计算模型选择标准的新方法。该方法非常通用,可应用于EM框架内涉及不完整数据的众多情况,从任意回归模型中随机缺失的协变量,到纵向响应和/或协变量存在不可忽视缺失的情况。为实现这一目标,我们针对缺失数据问题开发了一类信息准则,称为IC(H)(,)(Q),它在特殊情况下可得出赤池信息准则和贝叶斯信息准则。计算IC(H)(,)(Q)需要对一个称为H函数的复杂函数进行解析近似,同时还需要EM算法用于获得最大似然估计的输出。对H函数的近似导致了一大类信息准则,称为IC(H̃)(()(k)(),)(Q)。我们详细研究了IC(H̃)(()(k)(),)(Q)的理论性质,包括一致性。为消除对H函数的解析近似,我们提出了一种计算上更简单的对IC(H)(,)(Q)的近似,称为IC(Q),其计算仅依赖于EM算法的Q函数。我们在缺失数据问题的背景下详细讨论并检验了IC(H̃)(()(k)(),)(Q)和IC(Q)的优缺点。给出了大量模拟以展示该方法,并检验IC(H̃)(()(k)(),)(Q)和IC(Q)在缺失数据问题中的小样本和大样本性能。还给出了一个艾滋病数据集以说明所提出的方法。