Gur Ali Ozden
College of Administrative Sciences and Economics, Koç University, Istanbul, Turkey.
Front Artif Intell. 2022 Dec 7;5:1015604. doi: 10.3389/frai.2022.1015604. eCollection 2022.
Efficient allocation of limited resources relies on accurate estimates of potential incremental benefits for each candidate. These heterogeneous treatment effects (HTE) can be estimated with properly specified theory-driven models and observational data that contain all confounders. Using causal machine learning to estimate HTE from big data offers higher benefits with limited resources by identifying additional heterogeneity dimensions and fitting arbitrary functional forms and interactions, but decisions based on black-box models are not justifiable.
Our solution is designed to increase resource allocation efficiency, enhance the understanding of the treatment effects, and increase the acceptance of the resulting decisions with a rationale that is in line with existing theory. The case study identifies the right individuals to incentivize for increasing their physical activity to maximize the population's health benefits due to reduced diabetes and heart disease prevalence. We leverage large-scale data from multi-wave nationally representative health surveys and theory from the published global meta-analysis results. We train causal machine learning ensembles, extract the heterogeneity dimensions of the treatment effect, sign, and monotonicity of its moderators with explainable AI, and incorporate them into the theory-driven model with our generalized linear model with the qualitative constraint (GLM_QC) method.
The results show that the proposed methodology improves the expected health benefits for diabetes by 11% and for heart disease by 9% compared to the traditional approach of using the model specification from the literature and estimating the model with large-scale data. Qualitative constraints not only prevent counter-intuitive effects but also improve achieved benefits by regularizing the model.
有限资源的有效分配依赖于对每个候选对象潜在增量效益的准确估计。这些异质性治疗效果(HTE)可以通过正确设定的理论驱动模型和包含所有混杂因素的观测数据来估计。利用因果机器学习从大数据中估计HTE,通过识别额外的异质性维度并拟合任意函数形式和相互作用,在资源有限的情况下能带来更高的效益,但基于黑箱模型的决策是不合理的。
我们的解决方案旨在提高资源分配效率,增强对治疗效果的理解,并以符合现有理论的理由提高对所得决策的接受度。该案例研究确定了合适的个体,以激励他们增加身体活动,从而因降低糖尿病和心脏病患病率而使人群的健康效益最大化。我们利用来自多轮全国代表性健康调查的大规模数据以及已发表的全球荟萃分析结果中的理论。我们训练因果机器学习集成模型,用可解释人工智能提取治疗效果的异质性维度、其调节因素的符号和单调性,并通过我们的带定性约束的广义线性模型(GLM_QC)方法将它们纳入理论驱动模型。
结果表明,与使用文献中的模型设定并通过大规模数据估计模型的传统方法相比,所提出的方法使糖尿病的预期健康效益提高了11%,心脏病的预期健康效益提高了9%。定性约束不仅能防止产生违反直觉的效果,还能通过对模型进行正则化来提高实际效益。