1 Department of Biostatistics & Epidemiology, School of Public Health & Health Sciences, University of Massachusetts, Amherst, MA, USA.
2 Netflix, Los Gatos, CA, USA.
Stat Methods Med Res. 2019 Jun;28(6):1761-1780. doi: 10.1177/0962280218774936. Epub 2018 Jun 19.
We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
我们经常试图在群组层面上评估自然发生或随机分配的暴露的影响。例如,关于健康的邻里决定因素的文献不断增加。同样,社区随机试验也被应用于了解干预措施在现实世界中的实施、可持续性以及对具有个体疗效的干预措施的人群影响。在这些环境中,由于共享的群组层面因素(包括暴露)以及个体之间的社会或生物相互作用,个体层面的结果是相关的。为了灵活有效地估计群组层面暴露的影响,我们提出了两种有针对性的最大似然估计量(TMLE)。第一个 TMLE 是在非参数因果模型下开发的,该模型允许群组内个体之间存在任意交互作用。这些相互作用包括结果的直接传播(即传染)和一个个体的协变量对另一个个体的结果的影响(即协变量干扰)。第二个 TMLE 是在因果子模型下开发的,假设群组层面和个体特定的协变量足以控制混杂。模拟比较了替代估计量,并说明了在避免不必要的假设的情况下,在估计过程中结合个体层面的风险因素和结果的潜在收益。我们的结果表明,在观察性环境下,子模型下的估计可能会导致偏差和误导性推断。在估计过程中纳入工作假设比假设它们在基础因果模型中成立更稳健。我们通过 HIV 预防和治疗的应用来说明我们的方法。