Kakampakou Lydia, Stokes Jonathan, Hoehn Andreas, de Kamps Marc, Lawniczak Wiktoria, Arnold Kellyn F, Hensor Elizabeth M A, Heppenstall Alison J, Gilthorpe Mark S
Department of Mathematics and Statistics, Lancaster University, Fylde College, Lancaster, LA1 4YF, UK.
MRC/CSO Social and Public Health Sciences Unit, School of Health and Wellbeing, University of Glasgow, Clarice Pears Building, 90 Byres Road, Glasgow, G12 8TB, UK.
BMC Med Res Methodol. 2025 Mar 22;25(1):79. doi: 10.1186/s12874-025-02504-6.
Understanding causality, over mere association, is vital for researchers wishing to inform policy and decision making - for example, when seeking to improve population health outcomes. Yet, contemporary causal inference methods have not fully tackled the complexity of data hierarchies, such as the clustering of people within households, neighbourhoods, cities, or regions. However, complex data hierarchies are the rule rather than the exception. Gaining an understanding of these hierarchies is important for complex population outcomes, such as non-communicable disease, which is impacted by various social determinants at different levels of the data hierarchy. The alternative of analysing aggregated data could introduce well-known biases, such as the ecological fallacy or the modifiable areal unit problem. We devise a hierarchical causal diagram that encodes the multilevel data generating mechanism anticipated when evaluating non-communicable diseases in a population. The causal diagram informs data simulation. We also provide a flexible tool to generate synthetic population data that captures all multilevel causal structures, including a cross-level effect due to cluster size. For the very first time, we can then quantify the ecological fallacy within a formal causal framework to show that individual-level data are essential to assess causal relationships that affect the individual. This study also illustrates the importance of causally structured synthetic data for use with other methods, such as Agent Based Modelling or Microsimulation Modelling. Many methodological challenges remain for robust causal evaluation of multilevel data, but this study provides a foundation to investigate these.
对于希望为政策制定和决策提供依据的研究人员而言,理解因果关系而非仅仅是关联至关重要——例如,在寻求改善人群健康结果时。然而,当代因果推断方法尚未充分应对数据层次结构的复杂性,比如家庭、邻里、城市或区域内人群的聚集情况。然而,复杂的数据层次结构是常态而非例外。了解这些层次结构对于复杂的人群结果(如非传染性疾病)很重要,因为它会受到数据层次结构不同层面各种社会决定因素的影响。分析汇总数据的替代方法可能会引入众所周知的偏差,例如生态谬误或可修改区域单元问题。我们设计了一个分层因果图,对在评估人群中的非传染性疾病时预期的多层次数据生成机制进行编码。因果图为数据模拟提供依据。我们还提供了一个灵活的工具来生成综合人群数据,该数据能捕捉所有多层次因果结构,包括因聚类大小产生的跨层次效应。首次,我们能够在正式的因果框架内量化生态谬误,以表明个体层面的数据对于评估影响个体的因果关系至关重要。本研究还说明了因果结构化的综合数据在与其他方法(如基于主体的建模或微观模拟建模)结合使用时的重要性。对于多层次数据的稳健因果评估,仍存在许多方法学挑战,但本研究为研究这些挑战奠定了基础。