Zhao L P, Lipsitz S, Lew D
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98104, USA.
Biometrics. 1996 Dec;52(4):1165-82.
In regression analysis, missing covariate data has been among the most common problems. Frequently, practitioners adopt the so-called complete-case analysis, i.e., performing the analysis on only a complete dataset after excluding records with missing covariates. Performing a complete-case analysis is convenient with existing statistical packages, but it may be inefficient since the observed outcomes and covariates on those records with missing covariates are not used. It can even give misleading statistical inference if missing is not completely at random. This paper introduces a joint estimating equation (JEE) for regression analysis in the presence of missing observations on one covariate, which may be thought of as a method in a general framework for the missing covariate data problem proposed by Robins, Rotnitzky, and Zhao (1994, Journal of the American Statistical Association 89, 846-866). A generalization of JEE to more than one such covariate is discussed. The JEE is generally applicable to estimating regression coefficients from a regression model, including linear and logistic regression. Provided that the missing covariate data is either missing completely at random or missing at random (in addition to mild regularity conditions), estimates of regression coefficients from the JEE are consistent and have an asymptotic normal distribution. Simulation results show that the asymptotic distribution of estimated coefficients performs well in finite samples. Also shown through the simulation study is that the validity of JEE estimates depends on the correct specification of the probability function that characterizes the missing mechanism, suggesting a need for further research on how to robustify the estimation from making this nuisance assumption. Finally, the JEE is illustrated with an application from a case-control study of diet and thyroid cancer.
在回归分析中,协变量数据缺失一直是最常见的问题之一。通常,从业者采用所谓的完整病例分析,即在排除协变量缺失的记录后,仅对完整数据集进行分析。使用现有的统计软件包进行完整病例分析很方便,但可能效率低下,因为那些协变量缺失记录上的观测结果和协变量未被利用。如果缺失并非完全随机,甚至可能给出误导性的统计推断。本文介绍了一种用于在一个协变量存在缺失观测值情况下进行回归分析的联合估计方程(JEE),它可以被视为是罗宾斯、罗特尼茨基和赵(1994年,《美国统计协会杂志》89卷,846 - 866页)提出的缺失协变量数据问题一般框架中的一种方法。还讨论了将JEE推广到多个此类协变量的情况。JEE通常适用于从回归模型估计回归系数,包括线性回归和逻辑回归。只要缺失的协变量数据是完全随机缺失或随机缺失(除了一些温和的正则条件),JEE得到的回归系数估计是一致的,并且具有渐近正态分布。模拟结果表明,估计系数的渐近分布在有限样本中表现良好。模拟研究还表明,JEE估计的有效性取决于表征缺失机制的概率函数的正确设定,这表明需要进一步研究如何在做出这个干扰性假设的情况下使估计更稳健。最后,通过一项饮食与甲状腺癌病例对照研究的应用对JEE进行了说明。