Stanford Center for Biomedical Informatics Research, Stanford University, CA, United States of America.
Stanford Center for Biomedical Informatics Research, Stanford University, CA, United States of America.
J Biomed Inform. 2023 Jul;143:104420. doi: 10.1016/j.jbi.2023.104420. Epub 2023 Jun 14.
To apply the latest guidance for estimating and evaluating heterogeneous treatment effects (HTEs) in an end-to-end case study of the Long-term Anticoagulation Therapy (RE-LY) trial, and summarize the main takeaways from applying state-of-the-art metalearners and novel evaluation metrics in-depth to inform their applications to personalized care in biomedical research.
Based on the characteristics of the RE-LY data, we selected four metalearners (S-learner with Lasso, X-learner with Lasso, R-learner with random survival forest and Lasso, and causal survival forest) to estimate the HTEs of dabigatran. For the outcomes of (1) stroke or systemic embolism and (2) major bleeding, we compared dabigatran 150 mg, dabigatran 110 mg, and warfarin. We assessed the overestimation of treatment heterogeneity by the metalearners via a global null analysis and their discrimination and calibration ability using two novel metrics: rank-weighted average treatment effects (RATE) and estimated calibration error for treatment heterogeneity. Finally, we visualized the relationships between estimated treatment effects and baseline covariates using partial dependence plots.
The RATE metric suggested that either the applied metalearners had poor performance of estimating HTEs or there was no treatment heterogeneity for either the stroke/SE or major bleeding outcome of any treatment comparison. Partial dependence plots revealed that several covariates had consistent relationships with the treatment effects estimated by multiple metalearners. The applied metalearners showed differential performance across outcomes and treatment comparisons, and the X- and R-learners yielded smaller calibration errors than the others.
HTE estimation is difficult, and a principled estimation and evaluation process is necessary to provide reliable evidence and prevent false discoveries. We have demonstrated how to choose appropriate metalearners based on specific data properties, applied them using the off-the-shelf implementation tool survlearners, and evaluated their performance using recently defined formal metrics. We suggest that clinical implications should be drawn based on the common trends across the applied metalearners.
在对长期抗凝治疗(RE-LY)试验的端到端案例研究中应用最新的异质治疗效果(HTE)估计和评估指南,并总结在个性化医疗研究中深入应用最先进的元学习器和新评估指标的主要收获。
基于 RE-LY 数据的特点,我们选择了四个元学习器(带 Lasso 的 S-learner、带 Lasso 的 X-learner、带随机生存森林和 Lasso 的 R-learner、因果生存森林)来估计达比加群的 HTE。对于(1)中风或全身性栓塞和(2)大出血的结局,我们比较了达比加群 150mg、达比加群 110mg 和华法林。我们通过全局零假设分析评估了元学习器对治疗异质性的高估,并使用两个新的指标(等级加权平均治疗效果(RATE)和估计治疗异质性校准误差)评估了它们的区分和校准能力。最后,我们使用部分依赖图可视化估计治疗效果与基线协变量之间的关系。
RATE 指标表明,要么应用的元学习器在估计 HTE 方面表现不佳,要么在任何治疗比较的中风/SE 或大出血结局中都没有治疗异质性。部分依赖图显示,几个协变量与多个元学习器估计的治疗效果之间存在一致的关系。应用的元学习器在不同的结局和治疗比较中表现出不同的性能,X-和 R-学习器的校准误差小于其他学习器。
HTE 估计困难,需要有一个有原则的估计和评估过程,以提供可靠的证据并防止假发现。我们已经展示了如何根据特定的数据特性选择合适的元学习器,使用现成的 survlearners 实现工具应用它们,并使用最近定义的正式指标评估它们的性能。我们建议根据应用的元学习器的共同趋势得出临床意义。