通过剥离算法利用广义线性模型进行因果发现

Causal Discovery with Generalized Linear Models through Peeling Algorithms.

作者信息

Wang Minjie, Shen Xiaotong, Pan Wei

机构信息

Department of Mathematics and Statistics, Binghamton University, State University of New York, Binghamton, NY 13902, USA.

School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

J Mach Learn Res. 2024;25.

PMID:39758585

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11699566/

Abstract

This article presents a novel method for causal discovery with generalized structural equation models suited for analyzing diverse types of outcomes, including discrete, continuous, and mixed data. Causal discovery often faces challenges due to unmeasured confounders that hinder the identification of causal relationships. The proposed approach addresses this issue by developing two peeling algorithms (bottom-up and top-down) to ascertain causal relationships and valid instruments. This approach first reconstructs a super-graph to represent ancestral relationships between variables, using a peeling algorithm based on nodewise GLM regressions that exploit relationships between primary and instrumental variables. Then, it estimates parent-child effects from the ancestral relationships using another peeling algorithm while deconfounding a child's model with information borrowed from its parents' models. The article offers a theoretical analysis of the proposed approach, establishing conditions for model identifiability and providing statistical guarantees for accurately discovering parent-child relationships via the peeling algorithms. Furthermore, the article presents numerical experiments showcasing the effectiveness of our approach in comparison to state-of-the-art structure learning methods without confounders. Lastly, it demonstrates an application to Alzheimer's disease (AD), highlighting the method's utility in constructing gene-to-gene and gene-to-disease regulatory networks involving Single Nucleotide Polymorphisms (SNPs) for healthy and AD subjects.

摘要

本文提出了一种利用广义结构方程模型进行因果发现的新方法，该模型适用于分析包括离散、连续和混合数据在内的各种类型的结果。由于未测量的混杂因素阻碍了因果关系的识别，因果发现常常面临挑战。所提出的方法通过开发两种剥离算法（自下而上和自上而下）来确定因果关系和有效的工具变量，从而解决了这个问题。该方法首先使用基于节点广义线性模型回归的剥离算法重建一个超图，以表示变量之间的祖先关系，该算法利用主变量和工具变量之间的关系。然后，它使用另一种剥离算法从祖先关系中估计父子效应，同时利用从父模型借用的信息对子模型进行去混杂。本文对所提出的方法进行了理论分析，确定了模型可识别性的条件，并为通过剥离算法准确发现父子关系提供了统计保证。此外，本文还展示了数值实验，与无混杂因素的现有最先进结构学习方法相比，展示了我们方法的有效性。最后，它展示了在阿尔茨海默病（AD）中的应用，突出了该方法在构建健康和AD受试者的涉及单核苷酸多态性（SNP）的基因-基因和基因-疾病调控网络中的效用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过剥离算法利用广义线性模型进行因果发现

Causal Discovery with Generalized Linear Models through Peeling Algorithms.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

通过剥离算法利用广义线性模型进行因果发现

Causal Discovery with Generalized Linear Models through Peeling Algorithms.

作者信息

机构信息

出版信息

相似文献

本文引用的文献