Division of General Internal Medicine, Brigham and Women's Hospital and Ariadne Labs, 1620 Tremont St. 3rd Floor, BC3 002D, Boston, MA, 02120-1613, USA.
McLean Hospital, Belmont, MA, USA.
Psychometrika. 2020 Dec;85(4):890-904. doi: 10.1007/s11336-020-09729-y. Epub 2020 Oct 2.
This paper considers multiple imputation (MI) approaches for handling non-monotone missing longitudinal binary responses when estimating parameters of a marginal model using generalized estimating equations (GEE). GEE has been shown to yield consistent estimates of the regression parameters for a marginal model when data are missing completely at random (MCAR). However, when data are missing at random (MAR), the GEE estimates may not be consistent; the MI approaches proposed in this paper minimize bias under MAR. The first MI approach proposed is based on a multivariate normal distribution, but with the addition of pairwise products among the binary outcomes to the multivariate normal vector. Even though the multivariate normal does not impute 0 or 1 values for the missing binary responses, as discussed by Horton et al. (Am Stat 57:229-232, 2003), we suggest not rounding when filling in the missing binary data because it could increase bias. The second MI approach considered is the fully conditional specification (FCS) approach. In this approach, we specify a logistic regression model for each outcome given the outcomes at other time points and the covariates. Typically, one would only include main effects of the outcome at the other times as predictors in the FCS approach, but we explore if bias can be reduced by also including pairwise interactions of the outcomes at other time point in the FCS. In a study of asymptotic bias with non-monotone missing data, the proposed MI approaches are also compared to GEE without imputation. Finally, the proposed methods are illustrated using data from a longitudinal clinical trial comparing four psychosocial treatments from the National Institute on Drug Abuse Collaborative Cocaine Treatment Study, where patients' cocaine use is collected monthly for 6 months during treatment.
本文考虑了在使用广义估计方程 (GEE) 估计边缘模型参数时,处理非单调缺失纵向二分类响应的多重插补 (MI) 方法。当数据完全随机缺失 (MCAR) 时,GEE 已被证明可以对边缘模型的回归参数给出一致的估计。然而,当数据是随机缺失 (MAR) 时,GEE 估计可能不一致;本文提出的 MI 方法旨在在 MAR 下最小化偏差。提出的第一种 MI 方法基于多元正态分布,但在多元正态向量中添加了二分类结果之间的成对乘积。尽管 Horton 等人讨论过(Am Stat 57:229-232, 2003)多元正态分布不会对缺失的二分类响应赋值 0 或 1,但我们建议在填充缺失的二分类数据时不要舍入,因为这可能会增加偏差。第二种 MI 方法是完全条件规范 (FCS) 方法。在这种方法中,我们根据其他时间点的结果和协变量为每个结果指定一个逻辑回归模型。通常,在 FCS 方法中,仅将其他时间点的结果的主要效应作为预测因子包含在内,但我们探讨了通过在 FCS 中还包含其他时间点的结果的成对交互作用是否可以减少偏差。在一项对非单调缺失数据的渐近偏差的研究中,还将提出的 MI 方法与没有插补的 GEE 进行了比较。最后,使用来自国家药物滥用研究所合作可卡因治疗研究的纵向临床试验的数据说明了提出的方法,其中患者的可卡因使用情况在治疗期间每月收集一次,共 6 个月。