Verny Louis, Sella Nadir, Affeldt Séverine, Singh Param Priya, Isambert Hervé
Institut Curie, PSL Research University, CNRS, UMR168, Paris, France.
Sorbonne Universités, UPMC Univ Paris 06, Paris, France.
PLoS Comput Biol. 2017 Oct 2;13(10):e1005662. doi: 10.1371/journal.pcbi.1005662. eCollection 2017 Oct.
Learning causal networks from large-scale genomic data remains challenging in absence of time series or controlled perturbation experiments. We report an information- theoretic method which learns a large class of causal or non-causal graphical models from purely observational data, while including the effects of unobserved latent variables, commonly found in many genomic datasets. Starting from a complete graph, the method iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and assesses edge-specific confidences from randomization of available data. The remaining edges are then oriented based on the signature of causality in observational data. The approach and associated algorithm, miic, outperform earlier methods on a broad range of benchmark networks. Causal network reconstructions are presented at different biological size and time scales, from gene regulation in single cells to whole genome duplication in tumor development as well as long term evolution of vertebrates. Miic is publicly available at https://github.com/miicTeam/MIIC.
在缺乏时间序列或可控扰动实验的情况下,从大规模基因组数据中学习因果网络仍然具有挑战性。我们报告了一种信息论方法,该方法可以从纯观测数据中学习一大类因果或非因果图形模型,同时纳入许多基因组数据集中常见的未观测到的潜在变量的影响。该方法从一个完全图开始,通过揭示间接路径的显著信息贡献来迭代地去除不必要的边,并通过对可用数据进行随机化来评估边特定的置信度。然后根据观测数据中的因果特征对剩余的边进行定向。该方法及相关算法miic在广泛的基准网络上优于早期方法。我们展示了不同生物规模和时间尺度下的因果网络重建,从单细胞中的基因调控到肿瘤发展中的全基因组复制以及脊椎动物的长期进化。Miic可在https://github.com/miicTeam/MIIC上公开获取。