CAUSALab, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Eur J Epidemiol. 2022 Apr;37(4):367-376. doi: 10.1007/s10654-022-00855-8. Epub 2022 Feb 22.
The accuracy of a prediction algorithm depends on contextual factors that may vary across deployment settings. To address this inherent limitation of prediction, we propose an approach to counterfactual prediction based on the g-formula to predict risk across populations that differ in their distribution of treatment strategies. We apply this to predict 5-year risk of mortality among persons receiving care for HIV in the U.S. Veterans Health Administration under different hypothetical treatment strategies. First, we implement a conventional approach to develop a prediction algorithm in the observed data and show how the algorithm may fail when transported to new populations with different treatment strategies. Second, we generate counterfactual data under different treatment strategies and use it to assess the robustness of the original algorithm's performance to these differences and to develop counterfactual prediction algorithms. We discuss how estimating counterfactual risks under a particular treatment strategy is more challenging than conventional prediction as it requires the same data, methods, and unverifiable assumptions as causal inference. However, this may be required when the alternative assumption of constant treatment patterns across deployment settings is unlikely to hold and new data is not yet available to retrain the algorithm.
预测算法的准确性取决于可能因部署环境而异的上下文因素。为了解决预测的这种固有局限性,我们提出了一种基于 g 公式的反事实预测方法,以预测在治疗策略分布不同的人群中的风险。我们将其应用于预测在美国退伍军人健康管理局接受艾滋病毒治疗的人群中 5 年死亡率,在不同的假设治疗策略下。首先,我们在观察数据中实施了一种常规方法来开发预测算法,并展示了当该算法被转移到具有不同治疗策略的新人群时可能会失败。其次,我们在不同的治疗策略下生成反事实数据,并使用它来评估原始算法对这些差异的稳健性,并开发反事实预测算法。我们讨论了在特定治疗策略下估计反事实风险比常规预测更具挑战性,因为它需要与因果推断相同的数据、方法和不可验证的假设。然而,当部署环境中治疗模式不变的替代假设不太可能成立并且尚无新数据可用于重新训练算法时,可能需要这样做。