Gruber Susan, Logan Roger W, Jarrín Inmaculada, Monge Susana, Hernán Miguel A
Department of Epidemiology, Harvard School of Public Health, Boston, MA, U.S.A.
Stat Med. 2015 Jan 15;34(1):106-17. doi: 10.1002/sim.6322. Epub 2014 Oct 15.
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results.
用于拟合边际结构模型的逆概率权重通常使用逻辑回归进行估计。然而,一种数据自适应程序可能能够更好地利用测量协变量中可用的信息。通过组合来自多种算法的预测,集成学习提供了一种替代逻辑回归建模的方法,以进一步减少估计的边际结构模型参数中的偏差。我们描述了两种集成学习方法在估计稳定权重方面的应用:超级学习(SL),一种依赖于V折交叉验证的集成机器学习方法,以及一种将数据划分为训练集和验证集的单一分区的集成学习器(EL)。分析了来自西班牙两项多中心队列研究(CoRIS和CoRIS-MD)的纵向数据,以估计HIV阳性受试者中开始联合抗逆转录病毒治疗与未开始治疗的死亡风险比。与逻辑回归建模相比,两种集成方法产生的风险比估计值都更远离零假设,且置信区间更窄。EL的计算时间不到SL的一半。我们得出结论,在拟合边际结构模型时,使用各种候选算法库的集成学习为逆概率权重的参数建模提供了一种替代方法。对于大型数据集,EL在比SL更短的时间内对解空间进行了丰富的搜索,且结果相当。