Schnitzer Mireille E, Lok Judith J, Bosch Ronald J
Faculté de pharmacie, Université de Montréal, Montréal, Québec H3C 3J7, Canada
The Department of Biostatistics and the Center for Biostatistics in AIDS Research at Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
Biostatistics. 2016 Jan;17(1):165-77. doi: 10.1093/biostatistics/kxv028. Epub 2015 Jul 29.
In longitudinal data arising from observational or experimental studies, dependent subject drop-out is a common occurrence. If the goal is estimation of the parameters of a marginal complete-data model for the outcome, biased inference will result from fitting the model of interest with only uncensored subjects. For example, investigators are interested in estimating a prognostic model for clinical events in HIV-positive patients, under the counterfactual scenario in which everyone remained on ART (when in reality, only a subset had). Inverse probability of censoring weighting (IPCW) is a popular method that relies on correct estimation of the probability of censoring to produce consistent estimation, but is an inefficient estimator in its standard form. We introduce sequentially augmented regression (SAR), an adaptation of the Bang and Robins (2005. Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962-972.) method to estimate a complete-data prediction model, adjusting for longitudinal missing at random censoring. In addition, we propose a closely related non-parametric approach using targeted maximum likelihood estimation (TMLE; van der Laan and Rubin, 2006. Targeted maximum likelihood learning. The International Journal of Biostatistics 2 (1), Article 11). We compare IPCW, SAR, and TMLE (implemented parametrically and with Super Learner) through simulation and the above-mentioned case study.
在观察性或实验性研究产生的纵向数据中,受相关因素影响的受试者失访是常见现象。如果目标是估计结局的边际完全数据模型的参数,那么仅对未删失的受试者拟合感兴趣的模型会导致有偏推断。例如,研究人员感兴趣的是在一种反事实情景下估计HIV阳性患者临床事件的预后模型,即假设所有人都持续接受抗逆转录病毒治疗(而实际情况是只有一部分人这样做)。逆删失概率加权法(IPCW)是一种常用方法,它依赖于对删失概率的正确估计以产生一致估计,但标准形式下它是一种低效估计量。我们引入了顺序增强回归(SAR),它是对Bang和Robins(2005年。缺失数据和因果推断模型中的双重稳健估计。《生物统计学》61卷,962 - 972页)方法的一种改编,用于估计完全数据预测模型,并针对纵向随机删失进行调整。此外,我们提出了一种使用靶向最大似然估计(TMLE;van der Laan和Rubin,2006年。靶向最大似然学习。《国际生物统计学杂志》2(1),第11篇文章)的密切相关的非参数方法。我们通过模拟和上述案例研究比较了IPCW、SAR和TMLE(以参数方式实现并使用超级学习器)。