Miao Wang, Tchetgen Tchetgen Eric
Peking University and Harvard University.
Stat Sin. 2018 Oct;28(4):2049-2067. doi: 10.5705/ss.202016.0322.
We study identification of parametric and semiparametric models with missing covariate data. When covariate data are missing not at random, identification is not guaranteed even under fairly restrictive parametric assumptions, a fact that is illustrated with several examples. We propose a general approach to establish identification of parametric and semiparametric models when a covariate is missing not at random. Without auxiliary information about the missingness process, identification of parametric models is strongly dependent on model specification. However, in the presence of a fully observed shadow variable, which is correlated with the missing covariate but otherwise independent of its missingness, identification is more broadly achievable, including in fairly large semiparametric models. With a shadow variable, special consideration is given to the generalized linear models with the missingness process unrestricted. Under such a setting, the outcome model is identified for familiar generalized linear models, and we provide counterexamples when identification fails. For estimation, we describe an inverse probability weighted estimator that incorporates the shadow variable to estimate the missingness process, and we evaluate its performance via simulations.
我们研究存在协变量数据缺失情况下的参数模型和半参数模型识别问题。当协变量数据非随机缺失时,即使在相当严格的参数假设下,识别也无法保证,文中通过几个例子说明了这一事实。我们提出一种通用方法,用于在协变量非随机缺失时建立参数模型和半参数模型的识别。在没有关于缺失过程的辅助信息时,参数模型的识别强烈依赖于模型设定。然而,在存在一个完全观测到的影子变量的情况下,该影子变量与缺失的协变量相关但与缺失情况无关,识别在更广泛的情况下是可以实现的,包括在相当大的半参数模型中。对于影子变量,我们特别考虑了缺失过程不受限制的广义线性模型。在这种设定下,对于常见的广义线性模型,结果模型是可识别的,并且我们给出了识别失败时的反例。对于估计,我们描述了一种逆概率加权估计器,它结合影子变量来估计缺失过程,并通过模拟评估其性能。