Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark.
Int J Epidemiol. 2022 Oct 13;51(5):1622-1636. doi: 10.1093/ije/dyac078.
Nearly all diseases are caused by different combinations of exposures. Yet, most epidemiological studies focus on estimating the effect of a single exposure on a health outcome. We present the Causes of Outcome Learning approach (CoOL), which seeks to discover combinations of exposures that lead to an increased risk of a specific outcome in parts of the population. The approach allows for exposures acting alone and in synergy with others. The road map of CoOL involves (i) a pre-computational phase used to define a causal model; (ii) a computational phase with three steps, namely (a) fitting a non-negative model on an additive scale, (b) decomposing risk contributions and (c) clustering individuals based on the risk contributions into subgroups; and (iii) a post-computational phase on hypothesis development, validation and triangulation using new data before eventually updating the causal model. The computational phase uses a tailored neural network for the non-negative model on an additive scale and layer-wise relevance propagation for the risk decomposition through this model. We demonstrate the approach on simulated and real-life data using the R package 'CoOL'. The presentation focuses on binary exposures and outcomes but can also be extended to other measurement types. This approach encourages and enables researchers to identify combinations of exposures as potential causes of the health outcome of interest. Expanding our ability to discover complex causes could eventually result in more effective, targeted and informed interventions prioritized for their public health impact.
几乎所有疾病都是由不同组合的暴露因素引起的。然而,大多数流行病学研究都集中在估计单一暴露因素对健康结果的影响上。我们提出了结果学习原因方法(CoOL),旨在发现导致人群中特定结果风险增加的暴露因素组合。该方法允许暴露因素单独作用和协同作用。CoOL 的路线图包括(i)用于定义因果模型的预计算阶段;(ii)具有三个步骤的计算阶段,即(a)在加性尺度上拟合非负模型,(b)分解风险贡献,(c)根据风险贡献将个体聚类成亚组;以及(iii)使用新数据进行假设开发、验证和三角剖分的后计算阶段,最终更新因果模型。计算阶段使用定制的神经网络来拟合加性尺度上的非负模型,并通过该模型使用逐层相关性传播来分解风险。我们使用 R 包 'CoOL' 在模拟和实际数据上演示了该方法。本报告重点介绍了二元暴露和结果,但也可以扩展到其他测量类型。该方法鼓励并使研究人员能够识别暴露因素组合作为感兴趣的健康结果的潜在原因。扩展我们发现复杂原因的能力最终可能会导致更有效、有针对性和明智的干预措施,根据其对公共卫生的影响进行优先排序。