Lipsitz S R, Ibrahim J G, Chen M H, Peterson H
Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.
Stat Med. 1999;18(17-18):2435-48. doi: 10.1002/(sici)1097-0258(19990915/30)18:17/18<2435::aid-sim267>3.0.co;2-b.
We propose a likelihood method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. In this paper, we focus on one missing covariate. We use a logistic model for the probability that the covariate is missing, and allow this probability to depend on the incomplete covariate. We allow the covariates, including the incomplete covariate, to be either categorical or continuous. We propose an EM algorithm in this case. For a missing categorical covariate, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For a missing continuous covariate, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. The methodology is illustrated using an example from a breast cancer clinical trial in which time to disease progression is the outcome, and the incomplete covariate is a quality of life physical well-being score taken after the start of therapy. This score may be missing because the patients are sicker, so this covariate could be non-ignorably missing.
我们提出了一种似然方法,用于在具有缺失协变量和不可忽视缺失数据机制的广义线性模型中估计参数。在本文中,我们专注于一个缺失的协变量。我们使用逻辑模型来描述协变量缺失的概率,并允许该概率依赖于不完全协变量。我们允许包括不完全协变量在内的协变量为分类变量或连续变量。在这种情况下,我们提出了一种期望最大化(EM)算法。对于缺失的分类协变量,我们推导出了EM算法的E步和M步的闭式表达式,以获得最大似然估计(MLE)。对于缺失的连续协变量,我们使用EM算法的蒙特卡罗版本,通过吉布斯采样器获得MLE。通过一个乳腺癌临床试验的例子来说明该方法,在该试验中,疾病进展时间是结果,不完全协变量是治疗开始后获得的生活质量身体状况评分。该评分可能会缺失,因为患者病情更严重,所以这个协变量可能是不可忽视地缺失。