Institute of Psychology, University of Münster, Münster, Germany.
Stat Med. 2023 Oct 15;42(23):4147-4176. doi: 10.1002/sim.9852. Epub 2023 Aug 2.
There has been growing interest in using nonparametric machine learning approaches for propensity score estimation in order to foster robustness against misspecification of the propensity score model. However, the vast majority of studies focused on single-level data settings, and research on nonparametric propensity score estimation in clustered data settings is scarce. In this article, we extend existing research by describing a general algorithm for incorporating random effects into a machine learning model, which we implemented for generalized boosted modeling (GBM). In a simulation study, we investigated the performance of logistic regression, GBM, and Bayesian additive regression trees for inverse probability of treatment weighting (IPW) when the data are clustered, the treatment exposure mechanism is nonlinear, and unmeasured cluster-level confounding is present. For each approach, we compared fixed and random effects propensity score models to single-level models and evaluated their use in both marginal and clustered IPW. We additionally investigated the performance of the standard Super Learner and the balance Super Learner. The results showed that when there was no unmeasured confounding, logistic regression resulted in moderate bias in both marginal and clustered IPW, whereas the nonparametric approaches were unbiased. In presence of cluster-level confounding, fixed and random effects models greatly reduced bias compared to single-level models in marginal IPW, with fixed effects GBM and fixed effects logistic regression performing best. Finally, clustered IPW was overall preferable to marginal IPW and the balance Super Learner outperformed the standard Super Learner, though neither worked as well as their best candidate model.
人们越来越感兴趣地使用非参数机器学习方法来估计倾向得分,以提高对倾向得分模型的不精确指定的稳健性。然而,绝大多数研究都集中在单水平数据设置上,而关于聚类数据设置中非参数倾向得分估计的研究则很少。在本文中,我们通过描述将随机效应纳入机器学习模型的通用算法来扩展现有研究,我们将其应用于广义增强建模 (GBM)。在一项模拟研究中,我们研究了在数据聚类、处理暴露机制非线性和存在未测量的聚类水平混杂的情况下,逻辑回归、GBM 和贝叶斯加性回归树用于逆概率治疗加权 (IPW) 的性能。对于每种方法,我们将固定和随机效应倾向得分模型与单水平模型进行比较,并评估它们在边际和聚类 IPW 中的使用情况。我们还研究了标准超级学习者和平衡超级学习者的性能。结果表明,当不存在未测量的混杂时,逻辑回归在边际和聚类 IPW 中都会产生中度偏差,而非参数方法则没有偏差。在存在聚类水平混杂的情况下,固定效应和随机效应模型在边际 IPW 中与单水平模型相比大大降低了偏差,其中固定效应 GBM 和固定效应逻辑回归表现最佳。最后,聚类 IPW 总体上优于边际 IPW,平衡超级学习者优于标准超级学习者,尽管它们都不如其最佳候选模型表现出色。