Center for Health Promotion and Disease Prevention, University of North Carolina at Chapel Hill, University of North Carolina, Chapel Hill, NC.
Health Serv Res. 2013 Oct;48(5):1798-817. doi: 10.1111/1475-6773.12068. Epub 2013 May 23.
To illustrate the use of ensemble tree-based methods (random forest classification [RFC] and bagging) for propensity score estimation and to compare these methods with logistic regression, in the context of evaluating the effect of physical and occupational therapy on preschool motor ability among very low birth weight (VLBW) children.
We used secondary data from the Early Childhood Longitudinal Study Birth Cohort (ECLS-B) between 2001 and 2006.
We estimated the predicted probability of treatment using tree-based methods and logistic regression (LR). We then modeled the exposure-outcome relation using weighted LR models while considering covariate balance and precision for each propensity score estimation method.
Among approximately 500 VLBW children, therapy receipt was associated with moderately improved preschool motor ability. Overall, ensemble methods produced the best covariate balance (Mean Squared Difference: 0.03-0.07) and the most precise effect estimates compared to LR (Mean Squared Difference: 0.11). The overall magnitude of the effect estimates was similar between RFC and LR estimation methods.
Propensity score estimation using RFC and bagging produced better covariate balance with increased precision compared to LR. Ensemble methods are a useful alterative to logistic regression to control confounding in observational studies.
展示集成树基方法(随机森林分类 [RFC] 和装袋)在评估物理和职业疗法对极低出生体重 (VLBW) 儿童学前运动能力的影响时,用于倾向评分估计的应用,并将这些方法与逻辑回归进行比较。
我们使用了 2001 年至 2006 年期间幼儿纵向研究出生队列 (ECLS-B) 的二次数据。
我们使用基于树的方法和逻辑回归 (LR) 估计治疗的预测概率。然后,我们使用加权 LR 模型来模拟暴露-结果关系,同时考虑每个倾向评分估计方法的协变量平衡和精度。
在大约 500 名 VLBW 儿童中,接受治疗与学前运动能力的适度提高有关。总体而言,与 LR 相比,集成方法产生了最佳的协变量平衡(均方差异:0.03-0.07)和最精确的效果估计(均方差异:0.11)。RFC 和 LR 估计方法之间的效果估计的整体幅度相似。
与 LR 相比,使用 RFC 和装袋进行倾向评分估计可以产生更好的协变量平衡和更高的精度。在观察性研究中,集成方法是控制混杂的一种有用替代逻辑回归的方法。