Hamaya Rikuta, Hara Konan, Manson JoAnn E, Rimm Eric B, Sacks Frank M, Xue Qiaochu, Qi Lu, Cook Nancy R
Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 900 Commonwealth Avenue East, Boston, MA, USA.
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Eur J Epidemiol. 2025 Feb;40(2):151-166. doi: 10.1007/s10654-024-01185-7. Epub 2025 Feb 13.
Recent advancements in machine learning (ML) for analyzing heterogeneous treatment effects (HTE) are gaining prominence within the medical and epidemiological communities, offering potential breakthroughs in the realm of precision medicine by enabling the prediction of individual responses to treatments. This paper introduces the methodological frameworks used to study HTEs, particularly based on a single randomized controlled trial (RCT). We focus on methods to estimate conditional average treatment effect (CATE) for multiple covariates, aiming to predict individualized treatment effects. We explore a range of methodologies from basic frameworks like the T-learner, S-learner, and Causal Forest, to more advanced ones such as the DR-learner and R-learner, as well as cross-validation for CATE estimation to enhance statistical efficiency by estimating CATE for all RCT participants. We also provide a practical application of these approaches using the Preventing Overweight Using Novel Dietary Strategies (POUNDS Lost) trial, which compared the effects of high versus low-fat diet interventions on 2-year weight changes. We compared different sets of covariates for CATE estimation, showing that the DR- and R-learners are useful for the estimation of CATE in high-dimensional settings. This paper aims to explain the theoretical underpinnings and methodological nuances of ML-based HTE analysis without relying on technical jargon, making these concepts more accessible to the clinical and epidemiological research communities.
机器学习(ML)在分析异质性治疗效果(HTE)方面的最新进展在医学和流行病学领域日益突出,通过预测个体对治疗的反应,在精准医学领域提供了潜在的突破。本文介绍了用于研究HTE的方法框架,特别是基于单个随机对照试验(RCT)的框架。我们专注于估计多个协变量的条件平均治疗效果(CATE)的方法,旨在预测个体化治疗效果。我们探索了一系列方法,从T-learner、S-learner和因果森林等基本框架,到DR-learner和R-learner等更先进的方法,以及用于CATE估计的交叉验证,通过为所有RCT参与者估计CATE来提高统计效率。我们还使用“使用新型饮食策略预防超重(POUNDS Lost)”试验提供了这些方法的实际应用,该试验比较了高脂肪与低脂肪饮食干预对2年体重变化的影响。我们比较了用于CATE估计的不同协变量集,表明DR-learner和R-learner在高维环境中对CATE估计很有用。本文旨在解释基于ML的HTE分析的理论基础和方法细微差别,而不依赖技术术语,使临床和流行病学研究界更容易理解这些概念。