Shin Yongyun, Raudenbush Stephen W
University of Michigan, 439 West Hall, 1085 South University, Ann Arbor, Michigan 48109-1107, USA.
Biometrics. 2007 Dec;63(4):1262-8. doi: 10.1111/j.1541-0420.2007.00818.x. Epub 2007 May 14.
The development of model-based methods for incomplete data has been a seminal contribution to statistical practice. Under the assumption of ignorable missingness, one estimates the joint distribution of the complete data for thetainTheta from the incomplete or observed data y(obs). Many interesting models involve one-to-one transformations of theta. For example, with y(i) approximately N(mu, Sigma) for i= 1, ... , n and theta= (mu, Sigma), an ordinary least squares (OLS) regression model is a one-to-one transformation of theta. Inferences based on such a transformation are equivalent to inferences based on OLS using data multiply imputed from f(y(mis) | y(obs), theta) for missing y(mis). Thus, identification of theta from y(obs) is equivalent to identification of the regression model. In this article, we consider a model for two-level data with continuous outcomes where the observations within each cluster are dependent. The parameters of the hierarchical linear model (HLM) of interest, however, lie in a subspace of Theta in general. This identification of the joint distribution overidentifies the HLM. We show how to characterize the joint distribution so that its parameters are a one-to-one transformation of the parameters of the HLM. This leads to efficient estimation of the HLM from incomplete data using either the transformation method or the method of multiple imputation. The approach allows outcomes and covariates to be missing at either of the two levels, and the HLM of interest can involve the regression of any subset of variables on a disjoint subset of variables conceived as covariates.
基于模型的不完全数据方法的发展对统计实践具有开创性贡献。在可忽略缺失性的假设下,人们从不完全或观测数据(y_{(obs)})估计(\theta\in\Theta)的完全数据的联合分布。许多有趣的模型涉及(\theta)的一一变换。例如,对于(i = 1,\cdots,n),(y_{(i)}\sim N(\mu,\Sigma))且(\theta = (\mu,\Sigma)),普通最小二乘(OLS)回归模型是(\theta)的一一变换。基于这种变换的推断等同于使用从(f(y_{(mis)}|y_{(obs)},\theta))多重填补缺失的(y_{(mis)})后的数据进行的OLS推断。因此,从(y_{(obs)})识别(\theta)等同于识别回归模型。在本文中,我们考虑一个具有连续结果的两级数据模型,其中每个聚类内的观测是相关的。然而,感兴趣的分层线性模型(HLM)的参数通常位于(\Theta)的一个子空间中。这种联合分布的识别过度识别了HLM。我们展示了如何刻画联合分布,使得其参数是HLM参数的一一变换。这导致使用变换方法或多重填补方法从不完全数据对HLM进行有效估计。该方法允许结果和协变量在两个层次中的任何一个层次上缺失,并且感兴趣的HLM可以涉及任何变量子集对被视为协变量的不相交变量子集的回归。