利用基因组数据中的多变量信息学习具有潜在变量的因果网络。

Learning causal networks with latent variables from multivariate information in genomic data.

作者信息

Verny Louis, Sella Nadir, Affeldt Séverine, Singh Param Priya, Isambert Hervé

机构信息

Institut Curie, PSL Research University, CNRS, UMR168, Paris, France.

Sorbonne Universités, UPMC Univ Paris 06, Paris, France.

出版信息

PLoS Comput Biol. 2017 Oct 2;13(10):e1005662. doi: 10.1371/journal.pcbi.1005662. eCollection 2017 Oct.

DOI:10.1371/journal.pcbi.1005662

PMID:28968390

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5685645/

Abstract

Learning causal networks from large-scale genomic data remains challenging in absence of time series or controlled perturbation experiments. We report an information- theoretic method which learns a large class of causal or non-causal graphical models from purely observational data, while including the effects of unobserved latent variables, commonly found in many genomic datasets. Starting from a complete graph, the method iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and assesses edge-specific confidences from randomization of available data. The remaining edges are then oriented based on the signature of causality in observational data. The approach and associated algorithm, miic, outperform earlier methods on a broad range of benchmark networks. Causal network reconstructions are presented at different biological size and time scales, from gene regulation in single cells to whole genome duplication in tumor development as well as long term evolution of vertebrates. Miic is publicly available at https://github.com/miicTeam/MIIC.

摘要

在缺乏时间序列或可控扰动实验的情况下，从大规模基因组数据中学习因果网络仍然具有挑战性。我们报告了一种信息论方法，该方法可以从纯观测数据中学习一大类因果或非因果图形模型，同时纳入许多基因组数据集中常见的未观测到的潜在变量的影响。该方法从一个完全图开始，通过揭示间接路径的显著信息贡献来迭代地去除不必要的边，并通过对可用数据进行随机化来评估边特定的置信度。然后根据观测数据中的因果特征对剩余的边进行定向。该方法及相关算法miic在广泛的基准网络上优于早期方法。我们展示了不同生物规模和时间尺度下的因果网络重建，从单细胞中的基因调控到肿瘤发展中的全基因组复制以及脊椎动物的长期进化。Miic可在https://github.com/miicTeam/MIIC上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c71/5685645/22483e0fe81e/pcbi.1005662.g001.jpg

相似文献

Learning causal networks with latent variables from multivariate information in genomic data.

PLoS Comput Biol. 2017 Oct 2;13(10):e1005662. doi: 10.1371/journal.pcbi.1005662. eCollection 2017 Oct.

MIIC online: a web server to reconstruct causal or non-causal networks from non-perturbative data.

Bioinformatics. 2018 Jul 1;34(13):2311-2313. doi: 10.1093/bioinformatics/btx844.

3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics.

BMC Bioinformatics. 2016 Jan 20;17 Suppl 2(Suppl 2):12. doi: 10.1186/s12859-015-0856-x.

A Multiattribute Gaussian Graphical Model for Inferring Multiscale Regulatory Networks: An Application in Breast Cancer.

Methods Mol Biol. 2019;1883:143-160. doi: 10.1007/978-1-4939-8882-2_6.

Whole-Transcriptome Causal Network Inference with Genomic and Transcriptomic Data.

Methods Mol Biol. 2019;1883:95-109. doi: 10.1007/978-1-4939-8882-2_4.

Time Delayed Causal Gene Regulatory Network Inference with Hidden Common Causes.

PLoS One. 2015 Sep 22;10(9):e0138596. doi: 10.1371/journal.pone.0138596. eCollection 2015.

A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data.

Bioinformatics. 2018 Mar 15;34(6):964-970. doi: 10.1093/bioinformatics/btx605.

Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival.

Bioinformatics. 2015 Aug 15;31(16):2607-13. doi: 10.1093/bioinformatics/btv164. Epub 2015 Mar 24.

NETBAGs: a network-based clustering approach with gene signatures for cancer subtyping analysis.

Biomark Med. 2015;9(11):1053-65. doi: 10.2217/bmm.15.96. Epub 2015 Oct 26.

From Correlation to Causality: Statistical Approaches to Learning Regulatory Relationships in Large-Scale Biomolecular Investigations.

J Proteome Res. 2016 Mar 4;15(3):683-90. doi: 10.1021/acs.jproteome.5b00911. Epub 2016 Jan 12.

引用本文的文献

CausalCCC: a web server to explore intracellular causal pathways enabling cell-cell communication.

Nucleic Acids Res. 2025 Jul 7;53(W1):W125-W131. doi: 10.1093/nar/gkaf404.

Preserving information while respecting privacy through an information theoretic framework for synthetic health data generation.

NPJ Digit Med. 2025 Jan 23;8(1):49. doi: 10.1038/s41746-025-01431-6.

CausalXtract, a flexible pipeline to extract causal effects from live-cell time-lapse imaging data.

Elife. 2025 Jan 17;13:RP95485. doi: 10.7554/eLife.95485.

Transcriptome data are insufficient to control false discoveries in regulatory network inference.

Cell Syst. 2024 Aug 21;15(8):709-724.e13. doi: 10.1016/j.cels.2024.07.006.

Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients.

iScience. 2024 Apr 16;27(5):109736. doi: 10.1016/j.isci.2024.109736. eCollection 2024 May 17.

A multistep computational approach reveals a neuro-mesenchymal cell population in the embryonic hematopoietic stem cell niche.

Development. 2024 Apr 1;151(7). doi: 10.1242/dev.202614. Epub 2024 Apr 4.

Granger causality analysis for calcium transients in neuronal networks, challenges and improvements.

Elife. 2023 Feb 7;12:e81279. doi: 10.7554/eLife.81279.

Single Cell Transcriptomics to Understand HSC Heterogeneity and Its Evolution upon Aging.

Cells. 2022 Oct 4;11(19):3125. doi: 10.3390/cells11193125.

Interactive exploration of a global clinical network from a large breast cancer cohort.

NPJ Digit Med. 2022 Aug 10;5(1):113. doi: 10.1038/s41746-022-00647-0.

Single-Cell RNA-Seq Reveals the Promoting Role of Ferroptosis Tendency During Lung Adenocarcinoma EMT Progression.

Front Cell Dev Biol. 2022 Jan 20;9:822315. doi: 10.3389/fcell.2021.822315. eCollection 2021.

本文引用的文献

Methods for causal inference from gene perturbation experiments and validation.

Proc Natl Acad Sci U S A. 2016 Jul 5;113(27):7361-8. doi: 10.1073/pnas.1510493113.

Inferring causal molecular networks: empirical assessment through a community-based effort.

Nat Methods. 2016 Apr;13(4):310-8. doi: 10.1038/nmeth.3773. Epub 2016 Feb 22.

3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics.

BMC Bioinformatics. 2016 Jan 20;17 Suppl 2(Suppl 2):12. doi: 10.1186/s12859-015-0856-x.

Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

PLoS Comput Biol. 2015 Jul 16;11(7):e1004394. doi: 10.1371/journal.pcbi.1004394. eCollection 2015 Jul.

Chromosomal instability, tolerance of mitotic errors and multidrug resistance are promoted by tetraploidization in human cells.

Cell Cycle. 2015;14(17):2810-20. doi: 10.1080/15384101.2015.1068482.

APOBEC Enzymes: Mutagenic Fuel for Cancer Evolution and Heterogeneity.

Cancer Discov. 2015 Jul;5(7):704-12. doi: 10.1158/2159-8290.CD-15-0344. Epub 2015 Jun 19.

Regulation of nucleotide metabolism by mutant p53 contributes to its gain-of-function activities.

Nat Commun. 2015 Jun 12;6:7389. doi: 10.1038/ncomms8389.

Decoding the regulatory network of early blood development from single-cell gene expression measurements.

Nat Biotechnol. 2015 Mar;33(3):269-276. doi: 10.1038/nbt.3154. Epub 2015 Feb 9.

COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Nucleic Acids Res. 2015 Jan;43(Database issue):D805-11. doi: 10.1093/nar/gku1075. Epub 2014 Oct 29.

The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

Nucleic Acids Res. 2015 Jan;43(Database issue):D726-36. doi: 10.1093/nar/gku967. Epub 2014 Oct 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用基因组数据中的多变量信息学习具有潜在变量的因果网络。

Learning causal networks with latent variables from multivariate information in genomic data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献