Ribeiro Adèle H, Crnkovic Milena, Pereira Jaqueline Lopes, Fisberg Regina Mara, Sarti Flavia Mori, Rogero Marcelo Macedo, Heider Dominik, Cerqueira Andressa
Institute of Medical Informatics, University of Münster, Münster, Germany.
Department of Statistics, Federal University of São Carlos (UFSCar), São Carlos, Brazil.
Front Genet. 2024 Dec 9;15:1436947. doi: 10.3389/fgene.2024.1436947. eCollection 2024.
Cardiometabolic diseases, a major global health concern, stem from complex interactions of lifestyle, genetics, and biochemical markers. While extensive research has revealed strong associations between various risk factors and these diseases, latent confounding and limited causal discovery methods hinder understanding of their causal relationships, essential for mechanistic insights and developing effective prevention and intervention strategies.
We introduce anchorFCI, a novel adaptation of the conservative Really Fast Causal Inference (RFCI) algorithm, designed to enhance robustness and discovery power in causal learning by strategically selecting and integrating reliable anchor variables from a set of variables known not to be caused by the variables of interest. This approach is well-suited for studies of phenotypic, clinical, and sociodemographic data, using genetic variables that are recognized to be unaffected by these factors. We demonstrate the method's effectiveness through simulation studies and a comprehensive causal analysis of the 2015 ISA-Nutrition dataset, featuring both anchorFCI for causal discovery and state-of-the-art effect size identification tools from Judea Pearl's framework, showcasing a robust, fully data-driven causal inference pipeline.
Our simulation studies reveal that anchorFCI effectively enhances robustness and discovery power while handles latent confounding by integrating reliable anchor variables and their non-ancestral relationships. The 2015 ISA-Nutrition dataset analysis not only supports many established causal relationships but also elucidates their interconnections, providing a clearer understanding of the complex dynamics and multifaceted nature of cardiometabolic risk.
AnchorFCI holds significant potential for reliable causal discovery in complex, multidimensional datasets. By effectively integrating non-ancestral knowledge and addressing latent confounding, it is well-suited for various applications requiring robust causal inference from observational studies, providing valuable insights in epidemiology, genetics, and public health.
心脏代谢疾病是全球主要的健康问题,源于生活方式、基因和生化标志物之间的复杂相互作用。尽管大量研究揭示了各种风险因素与这些疾病之间的紧密关联,但潜在的混杂因素和有限的因果发现方法阻碍了对其因果关系的理解,而这种理解对于获得机制性见解以及制定有效的预防和干预策略至关重要。
我们引入了anchorFCI,它是保守的快速因果推理(RFCI)算法的一种新颖变体,旨在通过从一组已知不受感兴趣变量影响的变量中策略性地选择和整合可靠的锚定变量,来增强因果学习中的稳健性和发现能力。这种方法非常适合对表型、临床和社会人口统计学数据进行研究,使用已知不受这些因素影响的基因变量。我们通过模拟研究以及对2015年ISA - 营养数据集的全面因果分析来证明该方法的有效性,其中既使用了anchorFCI进行因果发现,也使用了来自朱迪亚·珀尔框架的先进效应大小识别工具,展示了一个强大的、完全数据驱动的因果推理流程。
我们的模拟研究表明,anchorFCI通过整合可靠的锚定变量及其非祖先关系,有效地增强了稳健性和发现能力,同时处理了潜在的混杂因素。对2015年ISA - 营养数据集的分析不仅支持了许多已确立的因果关系,还阐明了它们之间的相互联系,从而更清楚地理解了心脏代谢风险的复杂动态和多方面性质。
AnchorFCI在复杂的多维数据集中进行可靠因果发现具有巨大潜力。通过有效整合非祖先知识并解决潜在的混杂因素,它非常适合各种需要从观察性研究中进行稳健因果推理的应用,为流行病学、遗传学和公共卫生提供了有价值的见解。