非ignorable协变量缺失数据问题中的经验似然

Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.

作者信息

Xie Yanmei, Zhang Biao

机构信息

出版信息

Int J Biostat. 2017 Apr 20;13(1):/j/ijb.2017.13.issue-1/ijb-2016-0053/ijb-2016-0053.xml. doi: 10.1515/ijb-2016-0053.

DOI:10.1515/ijb-2016-0053

PMID:28441139

Abstract

Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).

摘要

在回归分析中，协变量数据缺失的情况经常出现，这在健康和社会科学以及调查抽样中也屡见不鲜。当一些协变量被完全观测到，而其他协变量在某些研究对象中缺失时，我们研究了在假定的条件均值函数中分析不可忽略的协变量缺失数据问题的方法。我们采用了Bartlett等人（《当协变量为非随机缺失时提高完全病例分析的效率》。《生物统计学》2014年；15:719 - 30）在处理具有不可忽略的缺失协变量的回归分析时的半参数视角，他们在其中引入了两个工作模型的使用，即缺失概率的工作模型和工作条件得分模型。在本文中，我们研究了一种针对不可忽略的协变量缺失数据问题的经验似然方法，目的是在协变量缺失数据的分析中有效利用这两个工作模型。我们提出了一种统一的方法来构建一个无偏估计方程系统，其中方程的数量多于感兴趣的未知参数的数量。这些无偏估计方程的一个有用特性是它们自然地将不完整数据纳入数据分析，即使工作回归函数未被指定为最优回归函数，也能够寻求对感兴趣参数的有效估计。我们应用经验似然的一般方法来最优地组合这些无偏估计方程。我们提出了潜在回归参数的三个最大经验似然估计量，并将它们的效率与其他现有竞争者进行比较。我们进行了一项模拟研究，以比较各种方法在偏差、效率和对模型误设的稳健性方面的有限样本性能。通过对美国国家健康与营养检查调查（NHANES）的一个数据集的分析，也展示了所提出的经验似然方法。