Wang Kaicheng, Rosman Lindsey, Lu Haidong
Yale Center for Analytical Sciences, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut.
Division of Cardiology, Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina.
medRxiv. 2025 Jun 17:2025.06.16.25329708. doi: 10.1101/2025.06.16.25329708.
Machine learning (ML) approaches for propensity score estimation are increasingly used with the expectation of improving covariate balance and reducing bias, but their validity in selecting appropriate confounders remains controversial. In this study, we estimated the effectiveness of sacubitril/valsartan versus angiotensin-converting enzyme inhibitor and angiotensin receptor blocker on all-cause mortality among heart failure patients with implantable cardioverter defibrillators in the U.S. Department of Veterans Affairs from 2016 to 2020. We compared results from traditional logistic regression- and ML-based propensity score methods and benchmarked them against the PARADIGM-HF randomized trial. The estimate from logistic regression with confounder selection (HR = 0.93, 95% CI 0.61 - 1.42; 27-month RR = 0.87, 95% CI 0.59 - 1.21) most closely aligned with the trial result (HR = 0.81; 95% CI 0.61 - 1.06). In contrast, generalized boosting models did not outperform traditional logistic regression, and may amplify bias when combined with a data-driven confounder selection (HR = 0.63, 95% CI 0.31 - 1.30; RR = 0.61, 95% CI 0.33 - 1.04). Our findings suggest that ML-based propensity scores may introduce overadjustment bias and underscore the importance of subject-matter knowledge in causal inference with high-dimensional real-world data.
用于倾向评分估计的机器学习(ML)方法越来越多地被使用,期望能改善协变量平衡并减少偏差,但其在选择合适混杂因素方面的有效性仍存在争议。在本研究中,我们评估了沙库巴曲缬沙坦与血管紧张素转换酶抑制剂及血管紧张素受体阻滞剂相比,对2016年至2020年美国退伍军人事务部植入式心脏复律除颤器的心力衰竭患者全因死亡率的影响。我们比较了传统逻辑回归和基于ML的倾向评分方法的结果,并将其与PARADIGM-HF随机试验进行基准对比。采用混杂因素选择的逻辑回归估计值(HR = 0.93,95%CI 0.61 - 1.42;27个月RR = 0.87,95%CI 0.59 - 1.21)与试验结果(HR = 0.81;95%CI 0.61 - 1.06)最为接近。相比之下,广义提升模型并未优于传统逻辑回归,并且在与数据驱动的混杂因素选择相结合时可能会放大偏差(HR = 0.63,95%CI 0.31 - 1.30;RR = 0.61,95%CI 0.33 - 1.04)。我们的研究结果表明,基于ML的倾向评分可能会引入过度调整偏差,并强调了在高维真实世界数据的因果推断中专业知识的重要性。