Wang Jiachen, Zhang Yuelei, Chen Luonan, Liu Xiaoping
Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China.
Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, 200031, China.
Adv Sci (Weinh). 2024 Dec;11(46):e2409170. doi: 10.1002/advs.202409170. Epub 2024 Oct 23.
Quantifying molecular regulations between genes/molecules causally from observed data is crucial for elucidating the molecular mechanisms underlying biological processes at the network level. Presently, most methods for inferring gene regulatory and biological networks rely on association studies or observational causal-analysis approaches. This study introduces a novel approach that combines intervention operations and diffusion models within a do-calculus framework by deep learning, i.e., Causal Diffusion Do-calculus (CDD) analysis, to infer causal networks between molecules. CDD can extract causal relations from observed data owing to its intervention operations, thereby significantly enhancing the accuracy and generalizability of causal network inference. Computationally, CDD has been applied to both simulated data and real omics data, which demonstrates that CDD outperforms existing methods in accurately inferring gene regulatory networks and identifying causal links from genes to disease phenotypes. Especially, compared with the Mendelian randomization algorithm and other existing methods, the CDD can reliably identify the disease genes or molecules for complex diseases with better performances. In addition, the causal analysis between various diseases and the potential factors in different populations from the UK Biobank database is also conducted, which further validated the effectiveness of CDD.
从观测数据中定量因果关系的基因/分子间分子调控,对于在网络层面阐明生物过程背后的分子机制至关重要。目前,大多数推断基因调控和生物网络的方法依赖于关联研究或观测性因果分析方法。本研究引入了一种新方法,即在深度学习的do-演算框架内结合干预操作和扩散模型,即因果扩散do-演算(CDD)分析,以推断分子间的因果网络。由于其干预操作,CDD可以从观测数据中提取因果关系,从而显著提高因果网络推断的准确性和通用性。在计算方面,CDD已应用于模拟数据和真实组学数据,这表明CDD在准确推断基因调控网络和识别从基因到疾病表型的因果联系方面优于现有方法。特别是,与孟德尔随机化算法和其他现有方法相比,CDD能够以更好的性能可靠地识别复杂疾病的疾病基因或分子。此外,还对英国生物银行数据库中不同人群的各种疾病与潜在因素之间进行了因果分析,进一步验证了CDD的有效性。